THE JOURNAL OF 
EDUCATIONAL PSYCHOLOGY 


Volume XXXII February, 1941 Number 2 


A CULTURE-FREE INTELLIGENCE TEST: 
Il. EVALUATION OF CULTURAL INFLUENCE 
ON TEST PERFORMANCE 


RAYMOND B. CATTELL, 8. NORMAN FEINGOLD, AND 
SEYMOUR B. SARASON 


Clark University 
I. THE FACTOR OF CULTURAL INFLUENCE AND ITS ISOLATION 


A hypothetically culture-free intelligence test, its construction and 
the principles of that construction have been described in a previous 
article: The present contribution purposes, first, to discuss critically 
analytical methods of appraising cultural influences in tests, and, 
secondly, to present results concerning the relative cultural influence in 
performance on the present test and on certain standard tests such as 
the Binet. | 

The problem of estimating cultural influences in intelligence test 
performance is not identical with that of determining the ratio of 
hereditary and environmental factors in intelligence, but it includes 
the latter problem and cannot profitably be discussed in abstraction 
from it. 

For, if we agree to use the term intelligence and to speak of.a single 
or compound “general ability,” the variations among individuals in 
their test scores in an intelligence test can be regarded as depending on: 

(1) Variations in the innate gene endowment which is responsible 
for the magnitude of this general ability, perhaps, e.g., in the genes 
defining the sum total of cerebral neurones. 

(2) Variations in environmentally (7.e., post-conceptually) pro- 
duced development of the general ability. 

(3) Variations in the closeness of the individual’s cultural training 
and experience to the cultural medium in which the test is expressed. 


1 Cattell, R. B.: “A culture-free intelligence test I.” J. educ. Psychol., Vol. 
1940, pp. 161-180. 


81 


82 The Journal of Educational Psychology 


(4) Variations in familiarity with tests and test situations, test 
training or ‘“‘test sophistication.’”’! Several slightly different and 
experimentally distinguishable types of preparedness are involved 
here. 

(5) Fluctuations in the underlying general ability itself, through 
physiological and other variables. 

(6) Fluctuations in the effective expression or application of the 
ability through varying strength and direction of volition. 

(7) Chance errors in measurement not included in the above. 

Resorting to a formula, for facility in later discussions, we may say 
that any performance P in an intelligence test is a function of the fac- 
tors in the following algebraic equation: 


where G is the innate ability; dG the environmental change; c the gain 
through the test being in terms of information or skills acquired by 
the subject (7.e. cultural appositeness) ; ¢ the effect of specific training 
in intelligence test situations; f the ‘‘function fluctuation’? through 
physiological, fatigue, and other effects; fv the function fluctuation 
through changes in will and interest; e the experimental error in the 
measurement, resulting from causes too trivial for inclusion under 
systematic headings. K is a factor to cover special abilities, t.e. group 
and specific factors, existing in the intelligence test, and their vari- 
ability, which might correspondingly be considered under seven head- 
ings. Since the treatment of variables which follows applies in the 
main part passu to the special ability factors, and since we are not 
specifically interested in this aspect of the problem, we shall make no 
further reference to K. Nor are we immediately concerned with 
function fluctuation (f + fv), but it seems best to envisage the setting 
of the whole problem before dealing with the segment which concerns 

Conceivably there are other variables which could be included, 
notably a speed-with-age factor* which might not be included in 
(f + fv) because it has a systematic and not a fluctuant trend; but, 


1Vernon, P. E.: “Intelligence test sophistication.” Brit. J. educ. Psychol., 
Vol. 1938, pp. 237-244. 

2 This term was proposed by R. A. Thouless at the Psychology Section of the 
British Association in 1937. 

3 Lorge, I.: ‘“‘The influence of the test upon the nature of mental decline as a 
function of age.” J. educ. Psychol., Vol. xxxu, 1936, pp. 100-110. 


} 


A Culture-free Intelligence Test: II 83 


with such debatable exceptions, the equation analyzes in accordance 
with what is at present known or suspected in regard to variations in 
intelligence test performance. If experiment should prove one of 
these factors to be non-existent, or not to be entirely independent of 
others, that fact will be revealed in the equations. It would seem 
best, in other words, to start with too fine, rather than too broad, a 
hypothetical structyre. 

The chief assumption is that there exists a general ability (G + d@) 

which can be assessed in abstraction from the particular tests which 
are used as its indicators. That is to say: Each individual is con- 
sidered to have a particular capacity in perceiving complexity of 
relations, which exists independently of the particular field of skill or 
knowledge in which the individual comes most fully to exercise the 
ability. It is something which can be conceived in abstraction from 
the field in which it is measured and as potential to another field, as 
energy can be conceived and calculated apart from the particular 
physical, chemical, or electrical system in which it happens to be 
resident. Only on this assumption is it possible to speak of an engi- 
neer and a lawyer as having equal intelligence (apart from special 
aptitudes) or of an Englishman and a Frenchman having the same 
amount of general ability, though the level of that ability is expressed 
in the relationships and skills of two different languages in the two 
cases. 
If this detachment of the power from its manifestations is possible 
(by calculation and abstraction), then it is correct to ask how far the 
power, as such, can be impaired or augmented by environmental 
influences, apart from the actual gains in knowledge of, and familiarity 
with, the cultural medium in which the test is fashioned. This empha- 
sizes that the environmentally produced change in intelligerice, dG, 
is in the subject himself, whereas his improvement in performance 
through education in the language, symbolism, or manual skills 
involved in the test—c—exists only in the relation between the indi- 
vidual ead the test. That c and d@ are essentially distinct will 
also be realized in reflecting that increases in ¢ can result only from 
mental influences, whereas the ability itself may be affected environ- 
mentally through physiological changes, e.g. encephalitis. 

Unfortunately this distinction has not been observed in many 
statements about the réles of heredity and environment, indeed not 
only dG and c, but also dG, c, t, and even the remaining smaller factors 
have been confused. ‘‘Environmental influence on intelligence” is 


¢ 


84 The Journal of Educational Psychology 


strictly dG, while c, ¢, and in some respects f and fv fall under the head- 
ing “Environmental influence on ability to score in a particular intelli- 
gence test.” 

The present research is concerned to isolate c, the influence on 
intelligence test scores of cultural familiarity with the test medium. 
To evaluate our seven factors we do not, fortunately, need to produce 
seven simultaneous equations; for it is possible to devise experimental 
situations which will keep most quantities constant and give us vari- 
ation in each factor in turn. 

Thus, for example, a split-half consistency coefficient is a guide to 
the magnitude of e. A comparison of this with the consistency coeffi- 
cient when a time interval intervenes will give a measure of f. The 
difference between variations of this magnitude and the variations 
arising when different motivations are employed will give a measure of 
fv. Sophistication t may be evaluated by taking culturally homo- 
geneous groups and practising some of them intensively in intelligence 
test situations, and so on. 

But the methodology of separating dG from c is not so simple, for 
familiarity with a culture cannot be obtained in a few days, or in so 
short a period that the possibility of a real growth of intelligence 
through exercise, dG, can be ruled out as negligible. Because the task 
of separating c appears to be in fact coextensive with no lesser problem 
than that of determining exactly the relative influence of nature and 
nurture upon intelligence we must turn aside for the space of one sec- 


tion to that problem. 


Il. METHODOLOGY IN NATURE-NURTURE INVESTIGATION 


Advances in knowledge of mental heredity are resolvable into 
statements as to the Mendelian mechanisms governing a certain mental 
character (e.g., as has been established for Huntington’s chorea) or 
statements of a more statistical nature defining the relative contribu- 
tions of herditary and environmental variances, as they exist in a given 
community, to the variance of a mental measurement (e.g., intelli- 
gence,' surgency”). Statements of the former sort are as yet few, 


1 Burks, B. S.: “Nature and nurture: their influence upon intelligence.” 
Twenty-Seventh Yearbook National Society for Study Educ., 1928, pp. 219-316. 
Cattell, R. B., and Wilson, J. L.: “Contributions concerning mental inherit- 
ance: I. Of intelligence.” Brit. J. educ. Psychol., Vol. vim, 1938, pp. 129-149. 
2 Cattell, R. B., and Malteno, V.: ‘Contributions concerning mental inherit- 
ance: II. Of temperament.” J. genet. Psychol., Vol. tv, 1940, pp. 31-47. 


t 


A Culture-free Intelligence Test: II 85 


being largely dependent on luck in finding complete genealogies and on 
astute clinical observation, so that most research has aimed at statis- 
tical findings concerning the nature-nurture variance ratio. 

The following procedures are the principal ones in vogue for the 
latter purpose: 

Comparison of mean differences or correlations of— 

(1) Identical twins (a) reared apart or (b) reared together with 
individuals at random in pairs. 

(2) (a) Identical twins with those of (b) fraternal twins. 

(3) (a) Siblings reared together with (b) siblings reared apart, and 
(c) individuals at random in pairs. 

(4) (a) Siblings reared together with (b) individuals at random 
reared together (foster siblings). 

(5) (a) Parents with (b) own children, and (c) foster children. 

(6) Comparison of developmental rates, mean IQ’s, or variance of 
IQ’s of groups of unrelated chance sampled individuals subject, in 
control and experimental groups, to deliberately varied environmental 
conditions. 

Unfortunately, hardly one of these methods is free from objections 
as to theoretical design or practical feasibility. Method 1 is ideal, 
but no one has yet succeeded in finding a sufficient sample of twins 
reared apart. Method 2 has disadvantages already pointed out else- 
where by one of the present writers.‘ As used by Hogben,? it involves 
the assumption that the environment of identical twins is no more 
similar than that of fraternal twins. In any case the comparison of 


identical and fraternal twin differences me 7. ide yields a ratio express- 


ing the relative potencies only of intra-familial hereditary variation 
(segregation) and inter-twin environment difference, a somewhat 
meaningless value in relation to community variances. 

Methods 4 and 5 have been most effectively used by Burks, Free- 
man, and others.? Among the results yet available, these come 
nearest to the perfect solution and are suspect only in so far as (1) 
children needing foster homes are not a true sample in that they do not 


1 Cattell, R. B., and Wilson, J. L.: Op. cit. 
? Herman, L. and Hogben, L.: “The intellectual resemblance of twins.” Proc. 
Roy. Soc., Edinburgh, Vol. tu, No. 9, 1932, 1933. 
* Burks, B. 8.: Op. cit. 
Newman, H. H., Freeman, F. N., and Holzinger, K. J.: Twins: A Study 
of Heredity and Environment. 1937. § 


86 The Journal of Educational Psychology 


have the normal community range of hereditary variability; (2) the 
foster homes may also be deficient in variability; (3) the tests used 
have been of a kind which inspection suggests (and the present article 
shows) are far from culture-free. The measure of environmental 
influence on intelligence clearly includes both c and dG. 

Incidentally, it is surprising, in glancing through any thorough 
survey of research in the nature-nurture field, such as Schwesinger’s,'! 
to find how little the most promising method of all has been used; 
namely, a combination of methods 3 and 4, for siblings reared apart 
are far more prevalent than separated identical twins, while the ratio 
which can be derived from the measurements is more usefully appli- 
cable to social and individual problems. 

By analysis of variance, the variance of intelligence in the general 
population may be considered as the sum of inter-familial and intra- 
familial variance, each considered both as an hereditary and an 
environmental variance. The ratio which is of most interest, espe- 
cially socially, is: 

Inter-familial hereditary variance 
Total inter-familial variance 


This, like the other nature-nurture ratios, e.g., 


Inter-individual hereditary variance 
Inter-individual total observed variance 


seems derivable from the data of Methods 3 and 4, using variance in 
place of mean difference. The difference of variance in siblings reared 
apart and siblings reared together would give an equation as follows: 

(1) Variance siblings apart = Hereditary variance of siblings 
+ community (inter-individual) environmental variance 

(2) Variance siblings together = Hereditary variance of siblings 
+ intra-familial environmental variance 
where 1 — 2 = Community environmental variance — intra-familial 

environmental variance 
= Inter-familial environmental variance 

By similar simultaneous equations involving the variance of foster 
siblings (unrelated reared together) and the variance of unrelated 
people reared apart, it seems possible to isolate each of the variances 
required for any desired nature-nurture ratio. 


1 Schwesinger, G. C.: Heredity and Environment. New York: Macmillan, 1933. 


\ 
A Culture-free Intelligence Test: II 87 


Whatever method is used, only a quite extraordinary ingenuity in 
experiment design, heaping complication upon an already complicated 
plan, could aspire to separating c from dG in the experiment itself. 
Indeed the more reasonable alternative seems to be (1) to use a test 
which from the first is known to be relatively free from cultural and 
training effects, or/and (2) to accept the fact that the environmental 
change or difference is (dG + c) not dG and to attempt by a separate 
experiment to estimate the size of c. \ It is to this preliminary in 
nature-nurture research that the present research is directed. 


Ill. THE EVALUATION OF C 


It might seem that once dG and c have been separately envisaged 
and defined, the devising of a crucial experimental situation to assess 
them in independence would be a simple sequel, but the task is in fact 
peculiarly difficult, for having had the mental exercise of learning a 
culture and being well versed and informed in it almost invariably go 
together. The former is the cause of the supposed dG; the latter is c, 
the familiarity with cultural content. 

Two possible ways of separation suggest themselves, based on the 
following assumptions: 

(1) The increase in G through exercise should be a function both 
of the time of exercise and the amount of culture encountered. If the 
information and skills were acquired very quickly, the effect on intelli- 
gence might not be so great as if they were exercised for a long time. 
Again if they were acquired in early years when they seemed difficult 
they would offer more exercise than to a mature intelligence. Finally, 
if they are acquired after the age at which mental capacity reaches 
biological maturity their effect should be very small. These three 
effects can be utilized in what we will call here the Time Séparation 
Method. It is applied below in the experiment on a group of recent 
immigrants compared with English-speaking natives and retested after 
a short period of acclimatization to the culture. 

(2) The increase in general ability, dG, may be expected to show 
itself in all fields in which G can be tested, whereas the cultural gains 
may be limited to one field, sense modality, or type of problem. 
Cultural gain can thus be assessed in itself and by actual choice of 
tests need not be allowed to intrude into the battery by which 
intelligence is measured. This we shall call the Transfer Effect 
Method, for the notion that intelligence can be improved by mental 
exercise assumes, in some sense, transfer of training. The method is 


88 The Journal of Educational Psychology 


applied below in the training of four groups in school, in which each is 
trained in a particular broad field of information and skill and tested 
on the same battery of intelligence tests. 

Incidentally, the fact that no very complete transfer of training 
has ever been found in complex activities is itself an argument against 
dG being expected to achieve any appreciable magnitude. True, most 


‘transfer experiments have dealt with skills rather than passive, 


insightful judgments, so that any reconsideration of transfer experi- 
ment results from the standpoint of intelligence improvement would 
need augmentation as well as reorientation of data. At the same time 
it would seem desirable to align factor analysis findings with transfer 
tendencies to discover whether difficult and easy transfer is explained 
by the extent and the boundaries of general, group, and specific 
factors. Thereby both factor analysis and transfer concepts might 
lose some of the arbitrariness which arises when they build their own 
worlds in independence. 

In concluding the general discussion relating the present problem 
to wider issues, it must be pointed out that we do not expect to measure 
c in such a way that it can be immediately introduced into the above 
equation for the more accurate calculation of nature-nurture ratios. 
For that purpose the manipulation of our results on the basis of certain 
assumptions will be necessary. Our aim becomes the limited and 
immediate one of finding the relative susceptibility to cultural influ- 
ence—c—of a representative few of the most important intelligence 
tests in common use. 


IV. EXPERIMENT 1, USING TRANSFER EFFECT METHOD 


In the present experiment and in the following Time Separation 
experiment, the same set of intelligence tests was used, since part of our 
aim was to check one method against the other. The tests were chosen 
for their widespread use and popularity as standard measures of 
general ability and to represent each, in the eyes of a bi-factor analyst, 
loading in one of the more important group factors: Verbal ability, 
arithmetical ability, mechanical aptitude, and manual dexterity. This 
last condition was satisfied in order that “cultural training” in each 
test might have a well-defined field not overlapping with that of other 
tests. 

Obviously the term “culture” in this particular experimental set up is 
employed in a narrower sense than is usually used by anthropologists. 
One culture differs from another in language, mores, technical skills, 


of 
i 


A Culture-free Intelligence Test: II 89 


and social organization. Our tests and trainings can differ only 
slightly and in a less coérdinated manner than cultures, but they can 
differ in some of the matters which are basic in distinguishing cultures, 
such as the amount of emphasis on mechanical knowledge and skill, the 
importance of language and vocabulary, the rédle of mathematical 
reasoning. That is to say, the experiment legitimately reproduces on 
a smaller scale a difference in cultural endowment. 

The tests used were: (1) The Terman-Merrill Revision of the Binet, 
(2) the present Culture-Free Test, (3) the arithmetical sections of the 
American Council on Education Psychological Examination, herein- 
after referred to as the A.C.E, (1938 edition), (4) the Arthur Perform- 
ance Test, and (5) the Ferguson Formboards. The fifth test was 
omitted from the first set of experiments since it resembled closely 
the Arthur. 

The subjects were thirteen- and fourteen-year-old high-school 
freshmen, considered old enough to adjust themselves efficiently to 
the testing and the training and young enough to show improvement of 
intelligence through training, if such occurs. They were divided into 
four groups approximately equal in numbers, age, and intelligence, 
as follows: 


TaBLe I.—NATURE OF THE EXPERIMENTAL GROUPS 


Group 1 Group 2 Group 3 Group 4 
14 14 14 18 
14.9 14.3 14.5 13.9 
vo 105 107 109 122 


All four groups were tested with all four tests. Then Group 1 was 
trained in verbal knowledge and skill, Group 2 in material of a geo- 
metrical nature! similar to that of the Culture-Free Test, Group 3 in 
manipulation of form-boards and performance tests similar to but not 
identical with those used in the Arthur Test, and Group 4 was given 
training in arithmetical and algebraic processes similar to those in 
the A.C.E. 

It is important in interpreting the results later obtained to notice 
that this training was in practice the least satisfactory part of the 


1 Actually most of the training was done with Raven’s “ Progressive Matrices”’ 
Test (see previous article), so that the training for the Culture-Free Test was much 
more of a specific training in the test than were the other trainings. 


90 The Journal of Educational Psychology 


experiment, and it is evident that a repetition of the research should 
take very special pains to anticipate and overcome the difficulties 
encountered here. In the first place, all groups were getting training 
in English and in mathematics, so that the hour a week extra training 
of the experimental group came with an effect of ‘‘ diminishing returns.” 
Secondly, it was not possible to equate the teachers, the motivation, or 
the classroom spirit. Thirdly, the groups began at very different levels 
of the learning curve with respect to the different kinds of training. 
Language and vocabulary were probably nearing saturation level for 
most of the children, whereas the material and set-up of the Culture- 
Free tests were novel to them and offered scope for rapid improvement 
in what little culture habit it contained. 

The groups were trained for six weeks, one hour a week, divided into 
five daily twelve-minute periods. An account of the training material 
and methods is set out in greater detail elsewhere.' Each group was 
then retested with all four tests. In the retesting the M form of the 
Stanford Binet was substituted for the L form; and the second form 
of the Arthur for the first form. Since there is only one form of the 
Culture-Free Test, the even numbers of the more difficult items were 
blocked out on the first test and the odd items on the retest. It was 
considered that a two-month interval would suffice for the forgetting of 
the easier, less diagnostic items. If any slight carry-over should occur, 
it would be in the direction of decreasing the relative culture freedom 
of the test. 

Before studying the “increase in intelligence” through training, 
either in the test having cultural elements similar to those in which the 
group was trained or in the test not culturally related to the training, 
it was necessary to allow for the normal maturation of intelligence in 
children of this age. After calculating this correction in Binet terms, 
appropriately for the average age and IQ of each group, it was also 
transformed, via standard 7 scores, to the other test measurements 
and in every case subtracted from the observed test-retest differences. 

The remaining differences as shown in Table II are practically all 
positive, and in most instances the improvement in a test is far greater 
for the group trained in the material similar to the test than in the 
groups trained in other cultural fields. The differences are in terms of 
T scores calculated from the mean (test-retest) standard deviation for 
each test. 


1 Sarason, Seymour: The Effects of Training on Four Intelligence Tests. Unpub- 
lished thesis, Clark University Library. 


2 

i 


A Culture-free Intelligence Test: II 91 


The improvement on each test resulting from deliberate training 
in its own culture medium is now set beside the mean improvement in 
that test resulting from three other and more remote types of training 


TaBLe INTELLIGENCE”? Garns as A Resutt or Cutturat GAIN AND TEST 


SopHISTICATION 
Training in 
Language and; Geometrical 
Test Performance, | Mathematics, 
(Group 3) (Group 2) asin G. Arthur} as in A.C.E. 
(Group 1) (Group 4) 
3.35* 5.95 1.35 3.50 
Culture-Free....... 7.15 9.35* 4.14 2.50 
3.75 5.95 6.35 19.00* 


* Gains made when training corresponded to nature of test. 


(plus test practice and sophistication 7.e., t), 1.e.,in Table III. If the 
right-hand column can really be considered to list the improvements in 
general mental ability, 7.e., the dG values, or even if it only represents 
test sophistication and practice, ¢t, it is clear that the proper measure 
of pure c gain is to be obtained by subtracting the right-hand column 


Taste IJI.—Garns From Direct INDIRECT TRAINING | 


Own training: Other training: 


ec gain + dG gain +t dG gain +t 


from the left, or by dividing the right-hand column into the left. 
Incidentally, a comparison of these gains of the “‘not-directly-trained”’ 
groups (other training in above table) with gains made by a small 
group tested and retested with these tests after a day’s interval, show- 
ing a rough equality, indicates that they must be considered as test 
sophistication ¢ rather than as spread of training or intelligence gain, 
dG. The expression of the c gain as a difference or a ratio can again be 
alternatively expressed, either in terms of simple standard scores 
(T scores) or in terms of critical ratios, which take into account the 


d 
5 
r 
a 
r 
t 


~ 


92 The Journal of Educational Psychology 


dispersion of the experimental groups and are obtained by dividing 
the gain by the standard error of the difference of the two (correlated, 
test-retest) groups. This gives four alternatives in expressing the c 
gain or cultural susceptibility, which are shown in Table IV. 


oF Tests TO TRAINING 


Difference 
trained and trained and 
untrained in | trained and 
untrained in untrained in 
... | terms of criti-| untrained in 
terms of criti- : f terms of score 
cal ratio terms of score 
Culture-Free....... .37 1.13 4.97 2.13 
— .35 — .88 — .08 — .99 
.75 1.59 1.18 1.61 


Whichever way the results are expressed, the A.C.E. seems most 
susceptible to cultural influence and the Arthur least. The Culture- 
Free and the Binet vie for second place. Since it may be argued that 
the critical ratio is a more comparable indicator of the validity of the 
difference, the Culture-Free test can with most accuracy have assigned 
to it the second position in freedom from cultural contamination. 


Vv. EXPERIMENT 2, USING TIME EFFECT METHOD 


A group of thirty-seven recent adult immigrants to America was 
collected and tested with the five tests mentioned, pantomime direc- 
tions being added to verbal directions where the standardization of 
the test indicated that understanding the instructions is not essentially 
part of the test situation. The group was mostly German-Jewish 
refugees, but also included Lithuanians, Poles, and Albanians. Their 
average residence in America was fourteen months and all spoke 
English with some facility. 

An attempt was made to equate this experimental group with a 
group of native Americans of the same age and IQ, but despite the fact 
that many of the immigrants were professional men and women the 
control group selected to have about the same professional representa- 
tion turned out to have a higher mean IQ and a smaller scatter. The 
lower mean and the greater scatter of the immigrants seem to be due 


A Culture-free Intelligence Test: II 93 


largely to the Lithuanian, Albanian, and Polish sample which happened 
to be very low. Most of the immigrants had been established long 
enough to have settled their immediate emotional difficulties and no 
subject was used who showed conflict or nervous strain over the test, 
eleven being dropped completely from the experiment for this reason 
after the first testing. Both groups were retested after a lapse of 
between two and three months (seventy-seven days for each subjec*). 
In this time the growth of the immigrant group in American culture 
ways and language was noticeable even in causal conversation. 

The improvements on the various tests were as shown in Table V. 


TABLE V.—IMPROVEMENT AT RETEST 


Immigrants Controls 


In order to be comparable, they are expressed in terms of the mean 
(test-retest) standard deviation of the control group. (The immi- 
grant group was not allowed to contribute to this standardization 
because its standard deviation departed so markedly from that com- 
monly found in the general population.) 

Now, since no cultural gain can be supposed on the part of the 
control group, the whole improvement of the controls must be put 
down to test sophistication and practice, ¢. The subtraction of this 
value, ¢t, from the gain of the immigrants, ¢ + c, should yield.the gain 
due to acculturation, c. To make this correction most accurate, it 
seems reasonable to take into account the greater mean intelligence of 
the controls. We shall assume that gain through test sophistication 
and practice, like any complex learning activity, is closely proportional 
to the individual’s intelligence. Before subtracting ¢, therefore, we 
shall multiply it by the ratio 


mean intelligence of immigrants 
mean intelligence of controls 


As in Experiment 1, the resulting c increases can be expressed 
either in terms of standard, 7, scores or further refined as critical 


94 The Journal of Educational Psychology 


ratios through dividing by the standard error of the difference of the 
correlated test and retest means, Table VI. 


VI.—CuLtTuRAL SUScCEPTIBILITIES: c VALUES 


Standard scores | Critical ratios 


If we take the critical ratio as being more diagnostic, the results of 
the present experiment agree with the first in giving the following order 
of freedom from cultural influence: 

1. Performance Tests (Arthur! and Ferguson). 

2. Culture-Free Test. 

3. Terman-Merrill Revision of Binet. 

4. A.C.E. Test. 

If the difference in terms of standard scores is taken, (1) the two experi- 
ments do not agree so well, though still similar in trend, and (2) the 
chief difference from the above result is a rise in the position of the 
Binet. \ 

As between the twoexperiments, the difference arising through the 
last method of scoring is almost solely due to the Binet having a more 
culture-free rating in the child than in the adult experiment. There is 
also a suggestion that the Performance Tests are relatively more cul- 
ture free with the children. Possibly children are already compara- 
tively overtrained in the Performance Test type of situation in their 
everyday life experiences. 

That the Binet seems relatively less culture-biassed for the children 
is most probably due to the relative weakness of the training (in lan- 
guage usage) which could be brought to bear in the experiment situ- 
ation. 

Considering the difference in the kind of cultural influence employed 
in the two experiments, it is surprising, and gratifying, that their 
results agree so well and that there is psychological consistency and 
sense in them, e.g., both performance tests behave in the same way. 
But where they disagree it seems reasonable to give decidedly more 
significance to the second experiment. For its use of the term ‘cul- 


1 Note, however, the great ‘‘test sophistication” gain of this test. 


i \ 


A Culture-free Intelligence Test: II 95 


ture’’ corresponds more closely to the usual broad use of the term; in 
Experiment 1 we deal only with small abstracted sectors of culture. 
Moreover, the situation of Experiment 2 certainly resembles most 
closely that in which one would in practice moot the use of a culture- 
free type of test. 


VI. THE VALIDITY AND CONSISTENCY OF THE TESTS AS INTELLIGENCE 
‘ 
TESTS 


A test may be free from cultural weighting and yet not be an 
intelligence test or a consistent measure of anything. The consisten- 
cies of the tests used here, with an interval of six weeks in the case of | 
the children, and seventy-seven days in the case of the adults, are 
shown in Table VII. 


TaBLe VII.—ConsistTency CoEFFICIENTs oF TESTS 


Adults: Adults: 

Children Immigrants Natives 

.71 (.81) .91 (.96) .87 (.93) 
.68 (.78) .82 (.90) .82 (.90) 
.59 (.81) .64 (.82) .62 (.80) 
.43 (.55) .55 (.71) .66 (.80) 


The consistencies for the children and for the (mean) adults are in 
the same order. With the exception of the Ferguson Form Boards, 
all the tests were, for the adults, roughly equal in duration so the 
greater differences in consistency cannot be ascribed to length. How- 
ever, if the consistency of the Culture-Free, which averaged four-fifths 
the length of the Binet, is corrected to the same length by the Spear- 
man-Brown Prophecy formula, the consistencies of these two types of 
test performance cease to be distinguishable (values in parentheses 
Table VII). The length-corrected coefficients are also shown for the 
other tests. It should not be overlooked, however, that the uncor- 
rected coefficients are correct for the tests as usually given. 

As charged in the introductory article to these experiments,' the 
current practice in discovering the validity of intelligence tests is more 
chaotic and contradictory than most psychologists care to admit. 

Some base validity on the assumption that scores on intelligence 
tests, with children, increase with age. But the same is true of many 
totally acquired skills in a homogeneous group in which intensive 

1 Cattell, R. B.: “A culture-free intelligence test I.” J. educ. Psychol., Vol. 
xxx1, 1940, pp. 161-180. 


96 The Journal of Educational Psychology 


teaching gives a steep gradient with age. It is also as true of special 
abilities as of intelligence. Logically pursued it would lead to measur- 
ing intelligence by stature, by the number of molars or by spelling 
ability. Some would proceed by comparing borderline defectives with 
younger and brighter children, or the latter with geniuses. This is all 
right if defectives and geniuses are (1) rightly selected in the first 
place and (2) not likely to differ from normals except in intelligence. 
A third prevalent method is to validate the test by comparison with 
real life success mainly (1) in school (comparison of grades) and (2) 
in social status (comparison of occupations). This has the same weak- 
ness as the last method: That success or failure are dependent on far 
more than intelligence, e.g., on industry, health, and persistence, which 
will then become included in intelligence tests. Others, again, 
validate each new intelligence test against an old one in an apostolic 
succession. If it takes one original test as the criterion, this is the 
silliest validity theory of all; if several tests are the criterion, it must 
bestir itself to provide an adequate hypothesis for weighting. Many 
psychologists use all four of these methods leavened by a liberal 
admixture of common sense, a procedure satisfactory to a cook but 
not to a chemist. 

The present writers take the view that intelligence is whatever 
general factor' can be separated out from a pool of tests containing 
mainly tests of an intelligence testing kind, 7.e., demanding insight into 
complex relations, ability to learn, to think abstractly, and to adapt 
means to purpose in new situations. That is to say, the first choice of 
test material and form is made on psychological grounds, but the 
refinement of the concept of intelligence, and the distillation of 
more and more saturated measures of it, by inspection of which the 
coucept may be made more precise, is a matter for factor analysis. 
Whether our predilections run to a sampling theory with Thomson? 
or to a general factor theory with Spearman*® and Holzinger,‘ our 
actual practical procedure in selecting tests by correlation methods 
will be the same and equally remote from the more haphazard validat- 
ing methods described above. 


1 Accepting the analysis into a general factor (followed by determination of 
group factors and specifics) seems (1) more psychologically meaningful and (2) 
less arbitrary than accepting the complex conditions of Thurstone’s simple struc- 
ture theory. 

2? Thomson, G. H.: The factorial analysis of human abilities. 1939. 

3 Spearman, C.: The abilities of man. 1932. 

‘ Holzinger, K. J., Swineford, F., and Harman, H.: Student manual of factor 
analysis. 1937. 


| 
t 


A Culture-free Intelligence Test: II 97 


The tests employed in the present research offer an excellent pool 
from which to distill an all-round general factor or to derive tests 
sampling the greatest number of bonds, for each has stood up to the 
common-sense criticisms of a universe of psychologists and each has 
become a standard and widely used test related in various researches 
to so many diverse proficiencies in everyday life that it may well be 
accepted as a hostage for all of them. 

The mean intercorrelations of each test with the others in the two 
researches here represented are presented in Table VIII. The means 
are taken through Fisher’s z function. 


VIII.—M£EAN INTERCORRELATION OF Eacu Test witH 


Adult Adult 
Cas immigrants} natives | Mean in 
all situa- 
ist | 2nd | 1st | ist | 2na| 
60; 56) 54] 57) 45] 36 52 
ales 56 | 65; 49; 37] 41 50 
44 57 | 33); 44; 43] 39 44 
10; 13; 25); 20 18 


In testing for a general factor the tests were first rearranged in 
whatever order gave the best hierarchy in the correlation table, and 
this was done by independent workers for each of the following six 
situations, so that no one hierarchical order should be influenced by 
knowledge of what hierarchy was being found elsewhere. 

The hierarchies were then tested by the tetrad difference criterion. 
Some resolved into a single general factor and specifies, others into 
general factor, group factors, and specifics. The general factor load- 
ings and the group factors are indicated beside each correlation table. 
(Table IX.) That some of the factor loadings apparently go above 
unity is due to the approximate method and small number employed. 

That the same test has different g saturations involves no contra- 
dictions, quite apart from any question of the uniqueness of g, when 
one reflects that a test is ‘‘all things to all men.” Strictly, “validity” 
is not a term to apply to a test but to a relation between a test and a 
subject. A good intelligence test for one age may be, as is well known, 
a poor test at another, or a good test when approached with one set 
of preconceptions and a poor test to subjects of a different mental 
background. 


a 


g 
h 
t 
) 
l 
| 


98 The Journal of Educational Psychology 


TaBLE IX.—VALIDITIES OF THE TESTS EMPLOYED 


Correlation Matrix I Correlation Matrix II 
Children: First Test Children: Retest 
tion tion 
1. Culture-Free...... . 89 87 
58] .56 . 67 3. Culture-Free....... .68 
54] .43 |.36 . 55) . 55) .43 64 
Correlation Matrix III Correlation Matrix IV 
Immigrants: First Test Immigrants: Retest 
G satura- eatura- 
Be 3 4/5 
1. Culture-Free. .|...)...)..... .99 1. Culture-Free 
4. Arthur........ .43].37| .44)...]..| .62 94. A.C.E......... .50 
5. Ferguson...... .17| — . 5. Ferguson...... 16) .07) . 25) .05). . 


Group factors: (1) Slight: negative Ferguson and | Group factors: (1) Slight: A.C.E. and Binet 


A.C.E. | (Residual: .24). 
(Residual: —.12). (2) Slight: Arthur and Ferguson 
(Residual: .18). 
Correlation Matrix V Correlation Matrix VI 
G satura- 
1; 2 3 4/5 tion 
4. Culture-Free...|.32|.82) .26).../.. 53 4. Culture-Free. .35 
5. Ferguson...... . 26) .13] — . 19) .42).. 35 5. Ferguson...... . 19} .09| .27).. .18 


Group factors: (1) Large: Arthur and Culture-| Group factors: (1) Appreciable: Culture-Free 


Free (Residual: .53). and Arthur (Residual: 
(2) Appreciable: Negative A.C.E. 40). 

and Ferguson (Residual: (2) Slight: Culture-Free and 

— .87). Ferguson (Residual: .21). 


(3) Slight: Culture-Free and Fer- 
guson (Residual: .23). 


Group Factors Summarized: 
(1) Negative Ferguson and A.C.E. (Twice) 
(2) Arthur and Culture-Free (Twice) 
(3) Ferguson and Culture-Free (Twice) 
(4) Arthur and Ferguson 
(5) Binet and A.C.E. (with accultured immigrants) 


‘ 


A Culture-free Intelligence Test: II 99 


However, one would expect consistency in these inconsistencies; for 
example, with populations of about the same age or sex or background, 
the saturations should be more similar than when these are very diverse. 
That is what we find in the present inquiry. In all three groups, the 
Culture-Free Test has a higher standing when first encountered than 
when met for the second time. (Is this due to using the same test 
form in the case of this test and not the others?) Further, with 
children and with immigrants, t.e., in general those at a lower cultural 
or intelligence level, the Culture-Free Test has a higher validity than 
any of the other tests. As one comes to the highly cultured native 
group, however, most of whom were teachers, the order changes and 
the Binet and A.C.E., charged with many scholastic skills, become the 
more saturated measures of general ability. If mean intercorrelation 
with the pool is taken as the criterion, however, the Culture-Free 
remains the most valid test, on the average of all situations. 


VII. CONCLUSIONS AND DISCUSSION 


(1) An intelligence test designed on the principles described in a 
previous article' has a validity in culturally homogeneous populations 
comparable to that of the most approved tests in present use. With 
populations of a low or heterogeneous cultural level, it has superior 
validity to existing tests. 

(2) When cultural influence is measured by a comparison of immi- 
grant and native groups, the test is decidedly more culture free than 
the Binet or A.C.E. tests but not quite so free as performance tests. 

(3) When culture susceptibility is measured by the effects of deliber- 
ate training in the appropriate field of information and skills, the 
Culture-Free Test occupies the same relative rank order as stated in 
conclusion (2) above, but departs somewhat more from the perform- 
ance test level. In connection with the ambiguity of some results in 
this section, one should point out the desirability of improving on the 
present methods as follows: (a) Instead of attempting to obtain test 
forms of exactly equal difficulty for test and retest, it would be better 
to split each experimental group, testing half on one form and half on 
another, alternating on the retest; (b) though it is theoretically impossi- 
ble to give “‘equal amounts” of training in entirely different types of 
performance, starting at different levels of learning and interest, the 
training could, with more control of the school time-table, have been 
more equalized than we were able to make it. The results of these 


1 Cattell, R. B.: “A culture-free intelligence test I.’ J. educ. Psychol., Vol. 
1940, pp. 161-180. 


" 
¢ 
‘ 


100 The Journal of Educational Psychology 


deficiencies (a and 6) in the present research were in the direction of 
reducing the training effect on the Binet, increasing it on the Culture- 
Free Test, and reducing the G validity at retest of the A.C.E. and 
Culture-Free. 

(4) All the tests used showed some gain through test sophistication, 
t, but only the cultural gain, c, on those tests which showed any, 
reached sufficient magnitude to be definitely statistically significant. 

(5) A formula has been presented to analyze the score on an intelli- 
gence test into its constituent variables, and this formula proves 
practicable in clarifying experimental results. 

(6) An analysis is made of the methods of investigating the nature- 
nurture problem with respect to intelligence, taking into account the 
existence of cultural influence in test performance. 

(7) Although some of the tests, notably the Terman-Merrill, are as 
valid as the Culture-Free Test, and others, notably the Arthur, are 
quite as free from susceptibility to cultural influence, the new Culture- 
Free Test is the only one which combines both of these advantages. 

(8) Finally, it is necessary to point out that “susceptibility to cul- 
tural influence” in the above context strictly refers only to the cul- 
tural ranges in these experiments. Every performance has some 
learning in it. Most performances have some saturation level deter- 
mined by physical, physiological, and hereditary traits, beyond which 
further training gives negligible returns. A culture-free intelligence 
test is one expressed in a type of performance in which (a) intelligence 
rather than physical or special aptitude factors are concerned and 
(b) life experience has brought the performance to its highest heredi- 
tary limit for the greatest number of people. Spatial perception, as 
employed in the new test, is likely to be brought to the level of over- 
learning, not only in people of European cultures but even among 
primitives in widely different habitats. On the other hand, the verbal, 
symbolic, or mechanical skills employed in the other tests we have 
examined, and which proved to be quite differently dispersed even 
among people of closely related cultures, would obviously be as com- 
pletely impossible for comparing primitives as they are doubtful or 
misleading in comparing, say, different social levels in one and the 
same culture. It, therefore, seems desirable to extend research with 
the present test to comparisons of most widely separated cultures, 
primitive and otherwise. 


1 The writers wish to thank the Worcester and Leicester, Massachusetts, school 
systems for generous assistance in the experiment. 


pts 

i 

4 

4 

‘ 

ore t 

‘ 
c« 

‘ 
‘ 


STUDIES IN THE PSYCHOLOGY OF MEMORIZING 
PIANO MUSIC. V: A COMPARISON OF PRE-STUDY 
PERIODS OF VARIED LENGTH* 


GRACE RUBIN-RABSON 
New York City 


Of the several techniques thus far examined in the construction of 

an adequate formula for learning-memorizingf{ pianoforte music, none 
has yielded such marked results as the analytical study of the material 
preliminary to keyboard completion. This factor, determined in the 
first study of this series,* has been incorporated in every subsequent 
experimental situation in several different forms. 
Tf, by using a small number of competent pianists whose musical 
talents are highly (though perhaps unequally) developed, tonal aware- 
ness during silent study and the ability to organize the material into 
meaningful units are guaranteed, then several extremely important 
aspects of the problem open up for examination: What is the optimum 
length of the period of silent study? To what point in the learning 
should it be continued—partly learned, wholly learned, or over- 
learned? What is the effect of these various stages on keyboard 
learning? Will they show economies in retention? 

Reference to the rather scant work in relevant areas furnishes few 
clews. Kovacs‘ tested the efficacy of learning small fragments away 
from the keyboard, but had only the subjects’ announcement of 
readiness to indicate the state of the learning. No time limit was set, 
no proof of any kind of mental or tonal image required (though his 
subjects were asked to concentrate on these), no indication that any- 
thing but covert kinaesthetic patterns were guiding the memorizing. 
However, within these limitations, he established the superiority of 
the away-from-keyboard learning approach. 

Insofar as certain superficial aspects of piano performance reveal 
an orderly sequence of muscle patterns, perhaps some relationship 
exists between the planning of these sequences and the running of a 
maze when the labyrinth has already been observed” or when the 
entire field is visualized, as in a pencil maze.® Such perception 
naturally facilitates learning. 


* Thanks are due Sophie Rabinovitch for a grant-in-aid. 

t Throughout these studies “learning’’ is used interchangeably with ‘“‘memoriz- 
ing”? since even at its inception the learning is directed toward immediate 
memorizing. 


101 


102 The Journal of Educational Psychology 


The conclusions of Ebbinghaus,' Witasek,'! and Gates’ concerning 
the practical values of recitation over reading have been fundamental 
in establishing learning approaches throughout this entire series. 
Since the objective is always to memorize as quickly as possible, no 
time is lost in wasteful reading. 

Recitation of piano material may occur in several forms. The 
material can be visualized as though mentally photographed. The 
melodic line in both hands can be either imagined or sung when it is 
adaptable to this purpose. Or both hands can be projected mentally 
on a keyboard with or without overt kinaesthetic behavior. Finally, 
all these things may occur together and the tonal, visual, and kinaes- 
thetic factors coalesce, so that, even without any muscular movement, 
the whole may be rehearsed with nearly the same vividness as exists 
during actual performance. Transcribing the material from memory 
provides a form of recitation which is possibly the best check on the 
accuracy and clarity of the organization of the musical units. Actual 
keyboard performance is, however, obviously the closest analogue to 
the recitation of verbal materials. 


THE PROBLEM 


The learning-memorizing of pianoforte music for dependable per- 
formance involves many factors: Formulation and organization of the 
large and detailed elements of the form and structure on a key, chord, 
melodic, and tonal basis; recognition of transpositions and similarity 
of materials for transfer of learning; grouping of units into hand 
patterns which often do not conform with either rhythmic or structural 
patterns; welding all these together through repeated kinaesthetic 
sequence on the keyboard. 

Since the intellectual concepts are better apprehended when free 
from the exigencies of maintaining smooth and rhythmic flow in 
performance, silent study and analysis must precede keyboard practice. 
Granted this necessity, how far shall the learning be carried before 
keyboard practice? This can be measured by allowing arbitrary 
periods of three, six, and nine minutes of silent study and checking 
this for learning stage by transcriptions of the memorized material. 

What economies derive from these three periods when evaluated 
by the number of subsequent keyboard trials needed to bring the 
material to memorized perfection? What relative retention is offered 
by this varied preparation after an interval of two weeks? 


| 
| 


Psychology of Memorizing Piano Music 103 


THE EXPERIMENT 


Subjects.—Nine adults, all competent musicians and skillful pianists, 
(see Table I) acted as subjects. By their own admission, silent reading 
of the fairly simple experimental materials produced nearly as vivid 
“inner” tonal hearing as actual keyboard performance. In addition, 
their extensive theoretical backgrounds (and in some cases considerable 
creative ability) insured ready insight into the grouping and organi- 
zation of the material for rapid learning.* 

Experimental Materials—The nine experimental compositions, 
ranging from five to eight measures in length, were arranged and 
adapted from unfamiliar Eighteenth Century music. Seven of them 
were taken from ‘‘The Seasons” by Haydn, one was by Zipoli, one by 
Martini. They were all of about first-grade difficulty, and most differ- 
ences in difficulty were a function of the length (Table I). 


TaBLeE I.—MusicaL BACKGROUND AND EXPERIMENTAL ACHIEVEMENT OF THE 
SupJEcTs AND RANGE OF THE EXPERIMENTAL COMPOSITIONS 


Mean Range 

Subjects 

Theoretical training (terms)....................... 16.4 ll -20 

Compositions. 

13.1 6.9 -38.1 


Experimental Design.—As in previous experiments, the four vari- 
ables of groups of subjects, methods, compositions and order of presen- 
tation were rotated. The influence of individual differences, differences 
in the compositions and the order of presentation operates equally in 
all learnings and any resulting differences can be ascribed to the 
methods. Since each subject performed the experiment three times, 
this design was further elaborated by using a complete rotating scheme 


* In this situation, then, there is reasonable assurance that the longer study 
period is not a mere “grinding in’”’ of a rote learning or a longer staring at the 
printed image to procure photographic reproduction. 


104 The Journal of Educational Psychology 


for each three subjects and setting the three designs together to form a 
large Latin-Greek Square. (See Table II.) 


TaBLe DxEsIGN 
In which A, B, and C represent the Three-, Six-, and Nine-Minute Study 
Periods; I, II, III, the Three Groups of Three Subjects Each; and the Arabic 


Num the Compositions. 


As Bs 
B; Cs Ao 
C; As By 
As By Cs 
A 4 B; Cs 
Bs C 9 A 


Procedure.—All learnings were done individually, only one subject 
and the experimenter being present. Three learnings were accom- 
plished on each of three successive days. When the subject received 
the material to study, he was told that when time was called he would 
write it from memory. He was unaware whether he would have three, 
six, or nine minutes for the learning. This insured an equal intensity 
from the beginning of each learning and brought it to completion as 
soon as possible, so that any extra time past this point could be 
analyzed for over-learning. In writing, he worked at his own speed 
and turned in the score when ready. Written without tension, this 
transcription provides a fair estimate of what had been accomplished 
in the study period. The learning was then continued at the key- 
board, each trial accomplished from beginning to end of the composi- 
tion without interruption,® by the coérdinated approach,’ and brought 
to perfect memorized performance at one sitting.’ 

Exactly two weeks later the materials were relearned in the same 


order but without preliminary study period. 


ANALYSIS OF DATA 


Learning Trials.—The differences by the three methods in the 
number of keyboard trials required to bring the learning to perfect 
memorized performance are tested for significance (Table III). The 
three most capable subjects are contrasted with the three least capable 
to discover whether economies in the three methods are equally 


iy 

¥ 


Psychology of Memorizing Piano Music 105 


influential for both types (Table IV). The variance of these trials is 
analyzed for the relative contributions of the four variables (Table V). 

Transcription Scores.—The differences in errors in transcription by 
the three methods are compared for reliability (Table VI). These 
differences are also summarized (Table VII). The variance is analyzed 
for the contribution of the four variables (Table V). The most and 
least capable subjects are contrasted for relative economies by the 
three methods as shown by the transcriptions (Table IV). 


TaBiLe III.—MEaANS AND STANDARD DEVIATIONS OF THE LEARNING TRIALS AND 
THE SIGNIFICANCE OF THE DIFFERENCES OF THE MEANS 


Method Mean| sp | Diff.of | SE | nstio 

means diff. 
A—three-minute study period........ 8.00 | 3.24 | A-B, 2.82 | .78 | 3.62 
B—six-mi:.ute study period.......... 5.18 | 3.14 | A-C, 3.93 | .81 | 4.85 
C—nine-minute study period......... 4.07 | 2.58 | B-C, 1.11 .79 | 1.40 


Relearning Trials.—The significant differences of the means of the 
relearning trials are evaluated as a gauge of the relative retention 
superiorities of the three methods (Table VIII); and the analysis of 
the variance of the relearning trials (Table V) to test further these 
superiorities. 

Learning Trials.—Table III shows that the six-minute study period 
(B) reduces significantly the number of trials required at the keyboard 
by A (three minutes) to bring the material to smooth memorized 
performance. The economy is 2.82 trials in favor of the longer study 
period with a critical ratio of 3.62. The addition of an extra three 
minutes of study as in C does not produce a proportional decrease and 
the difference of 1.11 trials between B and C is unreliable. C reduces 
the number required for A by 3.93 trials, with a large critical ratio 
of 4.85. The standard deviation indicates a progressive diminution, 
due, perhaps, to the need of the slower subjects for a longer study 
period to achieve the same learning point reached by the quicker ones 
in less time. ‘Table IV sums up the relative economies by the three 
methods for the three most and three least capable subjects. * 

In the keyboard trials these contrasted economies are not very 
striking nor even consistent for the two kinds of subjects. On the 
whole, in this respect, it can not be maintained that speedier or slower 
learners derive particular benefits from the increased study period. 


* These percentage estimates are offered tentatively and only to indicate possi- 
ble trends should the differences be conspicuously large. 


106 The Journal of Educational Psychology 


The analysis of the variance of the learning trials (Table V) proves 
the method to be the most important variable, leading by sixty-one 
per cent, and with a ratio five times Fisher’s* one per cent value (that 
value which might be exceeded in random sampling from a homo- 
Taste IV.—A CoMPARISON IN PERCENTAGES OF THE RELATIVE GAIN BY THE 


Turee METHODS FOR THE THREE Most CAPABLE AND THE THREE LEAST 
CAPABLE SUBJECTS 


Per cent in ad- 


Per cent of : 
group ave «| Vantage in 
methods 


Learning Trials 
A—(three minutes). 
112 B over A, 43 
168 B over A, 50 
B—(six minutes). 
118 C over B, 20 
C—(nine minutes). 
37 C over A, 75 
Transcription Scores 
A—/(three minutes). 
Three least 218 B over A, 88 
B—(six minutes). 
Three most capable....................-... 34 C over B, 21 
Three least capable... .. 130 C over B, 74 
C—(nine minutes). 
Three least capable....................20.. 21 C over A, 162 


* The averages of each composition for all subjects by all methods are compared 
with the averages of the three subjects under consideration. 


geneous population once in a hundred trials). The compositions 
account for seventeen per cent, probably due to their various lengths; 
the subjects for only twelve per cent, indicating a certain homogeneity 
in capacity. Both of these latter ratios exceed Fisher’s values. Order 
of learning at seven per cent shows some experimental adaptation and 
the unknown factors account for only three per cent. Both are 
satisfactorily small. 


s 
| 


Psychology of Memorizing Piano Music 107 


That lengthening the study period decreases the trials at the key- 
board is incontrovertible. Doubling the time produces real economies, 
tripling it does not. It remains to be seen, however, whether the 
amount of this reduction is justified by a doubling and tripling of the 
study period and whether other economies are in proportion to the time 
spent. 

mm compositions were all short and simple. They ranged in 
playing time from three to five and one-half repetitions per minute 


TaBLE V.—ANALYSIS OF THE VARIANCE OF THE LEARNING TRIALS, RELEARNING 
TRIALS, AND TRANSCRIPTION Errors, SHOWING THE PERCENTAGE OF THE 
Tota, Dug to Eacu VARIABLE, AND A COMPARISON OF THE RATIO OF 
THE VARIABLE AND THE EXPERIMENTAL ERROR WITH FiIsHER’s ONE 
Per Cent VALUE 


Learning Relearning | Transcription — 
Fisher’s trials trials errors 
Variable one per 

cont value! Pe | Ratio| Pe | Pe 

cent cent cent 

2.82 5.0; 12 55 3.7 | 7.9 
4.98 25.7) 61 0.0 0 | 30.9 | 66.4 
Compositions........... 2.82 7.3 | 17 24 8.6'| 18.5 
bate 4.98 3.1 7 2.6 15 4.9 


with an average of four and forty-four hundredths. In three minutes, 
therefore, the average composition could be played about thirteen 
times; in six minutes, twenty-six times. Nevertheless, the three and 
six minute additions to the three-minute study period netted 4 saving 
of only 2.82 and 3.93 trials, respectively. 

This ineconomy, however, could not be held against the longer time 
investment, were other economies produced by it markedly superior. 
Pianists memorize material for permanent retention, and an extra 
investment of learning time is not in and of itself necessarily wasteful. 

Transcription Scores.—In scoring the transcriptions, all omissions 
and errors were totalled, so that the higher the score the less successful 
was the transcription. An entire measure omitted increased the score 
by the sum of all the notes in both left and right hands. 

In Table VI the means of these scores show real differences between 
the state of learning by A and by B, and by A and C. The critical 
ratios of 3.24 and 4.52 are large enough for reliability. 


108 The Journal of Educational Psychology 


The interesting question arises here, as it did in the learning trials, 
as to why the difference between B and C is so small. The answer is 
easily found for the transcription scores in Table VII, not so easily 
for the learning trials. The number of perfect and very good scores 
in B (sixteen) and in C (eighteen) are not very different. For this 
simple material and for these capable subjects, the learning was 
practically complete at the end of six minutes, and the additional three 
minutes produced little change in the written quality. However, these 
three minutes were spent in review and rehearsal before playing. Such 
rehearsal involves kinaesthetic preparation, overt or subtle, in addition 
to all the other factors previously described. For all the over- 
learning (roughly equivalent in time to a dozen or so keyboard trials) 
and the formulation of hand patterns in preparation for actual per- 
formance, the difference in keyboard trials between B and C was only 
1.11 trials; hardly a justifiable saving unless, again, some other very 
superior economy inheres. 

Closer analysis of the transcription scores leads to the conclusion 
that the differences in the learning between A and B is quantitative, 
and not qualitative. The twenty-nine complete measures omitted 
in A (see Table VII) are not distributed throughout the transcription 
as they would be were the impression of the whole vague and frag- 
mentary, but are specifically located away from the beginning, toward 
the middle and the end; or the right hand is almost intact with the left 
hand unfinished. Stated differently, every subject studied the score 
throughout, then started from the beginning to learn thoroughly and 
memorize, learning either the right hand thoroughly and then adding 
the left, or learning both together, unit by unit. 

This is clearly supported in Table IV. The most capable subjects 
were able to complete the learning in B and so show a very large gain 
over A, with little gain in C. The least capable ones, however, adding 
less in the second three minutes, gained relatively not so much, but 
kept gaining through the third three minutes. For these subjects, 
another three minutes of study, or twelve minutes in all, would 
probably have produced a proportional number of very good or perfect 
transcriptions. * 


* Though this experimental group was carefully chosen for ability and seems 
fairly homogeneous (Table V shows the variation among subjects to be only 
twelve and eight per cent of the whole), the best subjects accomplished twice the 
amount of learning in a given time and doubled the learning speed on the key- 
board. (In Table I the range in learning trials is 3.77 to 8.55.) This estimate is 
supported by correlations of .71 between learning trials and transcription scores, 
and of .79 between learning and relearning trials. 


Psychology of Memorizing Piano Music 109 


The variance of the transcription scores shows approximately the 
same proportional values as the variance of the learning trials. Meth- 
ods are again most important, accounting for sixty-six per cent of the 
whole. The subjects contribute only seven and nine-tenths per cent. 

Relearning Trials —The relearning trials present by far the most 
interesting implications in the experiment. The methods reveal no 


TaBLe VI.—MeEans STANDARD DEVIATIONS OF THE TRANSCRIPTION SCORES 
AND THE SIGNIFICANCE OF THE DIFFERENCES OF THE MEANS 


Method Mean| sp | Diff.of | SE | niu, 

means diff. 
A—three-minute study period....... 24.22) 19.2 | A—-B, 14.56 | 4.49 | 3.24 
B—six-minute study period......... 9.66) 13.2 | A-C, 17.92 | 3.96 | 4.52 
C—nine-minute study period........ 5.52) 6.3 | B-C, 4.14] 2.86 | 1.45 


TaBLe VII.—ANALyYsIs OF TWENTY-SEVEN TRANSCRIPTIONS BY EacH 
SHOWING THE NuMBER PERFECT, THE NUMBER WITHIN Srx ERRORS, AND 
THE OmITTED* 


Measures omitted 
tha Co 
RH | LH eal 
plete 
A—three minutes pre-study........... 0 3 1 13 29 
B—six-minutes pre-study ............. 6 10 0 3 8t 
C—nine-minutes pre-study............ 11 7 3 2 4 


*The nine compositions totalled 59 measures, the nine subjects writing 531 
measures in all. 

t Of the eight complete measures omitted in B, one sample is responsible for 
five. 


superiorities whatever in retention value (Table VIII). The means 
differ by negligible amounts; the standard deviations indicate no con- 
sistent trend; the critical ratios are, of course, extremely small. The 
variance analysis (Table V) proves that the methods played no part 
in the variations from the mean and that these latter were largely due 
to the subjects and in some measure to the differences in the com- 
positions. The writer has no explanation to offer for the rather large 
contribution of fifteen per cent for order. 

Summary.—Keyboard trials are significantly reduced when the 
preliminary study period is doubled and tripled, but the triple period 
offers no advantage over the double one. The trials saved, however, 


110 The Journal of Educational Psychology 


| are not more than one-fourth the number that could have been run on 


the keyboard during the second three minutes and not more than 
one-sixth of the possible number during the added six-minute period. 
Doubling the period produced reliably better transcriptions, the 
improvement being specifically due to amount written, not to kind. 
Tripling the period produced no reliably better transcriptions. The 
triple period did, however, allow for a considerable amount of over- 
learning, mental rehearsal and kinaesthetic preparation. * 


TaBLE VIIT.—MEANS AND STANDARD DEVIATIONS OF THE RELEARNING TRIALS 
AND THE SIGNIFICANCE OF THE DIFFERENCES OF THE MEANS 


Method Mean| sp | Dif-of | SE | patio 

means diff. 
A—three-minute study period....... 4.78 | 1.62 | A-B, .5 .16 
B—six-minute study period......... 4.70 | 2.15 | A-C, —.07| .5 .14 
C—nine-minute study period........ 4.85 | 1.88 | B-C, —.15]| .5 .30 


Since the six- and nine-minute study periods produced about the 
same large number of perfect and nearly perfect transcriptions (about 
two-thirds of the total), the learning after six minutes might be said 
to have been complete, or at the one hundred per cent stage. It can 
then be concluded that the fifty per cent, the hundred per cent, and 
the one hundred and fifty per cent learning stages, with a difference 
of less than four keyboard trials between the first and last produce no 
differences in retention. In six learnings (of twenty-seven) after six 
minutes of study, a perfect performance was reached within two trials; 
after nine minutes, nine perfect performances within two trials; hardly 
a gain for three extra minutes of rehearsal. 

The most rewarding investment of time for preliminary study 


“ would seem to be that which allows for thorough analysis of the 


material and a very intensive preparation of some small beginning unit 
which may then be completed at the keyboard; the addition of suc- 
cessive units each intensively studied and brought to memorized per- 
formance; and the welding of these units into a whole.t The extra 
effort required mentally to carry and rehearse wholes or even larger 


_ units apparently offers no compensations. 


* Kovacs! found intensive mental rehearsal superior to keyboard practice for 
strengthening the memorizing, but this was after the memorizing had already been 
accomplished at the keyboard. 

t The small-unit method has already been found as efficient as working with 


wholes. 


Psychology of Memorizing Piano Music 111 


SUMMARY AND CONCLUSIONS 


The common practice of memorizing music through mechanical 
reading trials at the piano has been found inferior to immediate key- 
board memorizing superimposed on thorough and intensive analytical 
study. The implications of the advantages of the latter procedure 
might logically point to such exhaustive preliminary study as would 
insure perfect memorized keyboard performance of even long com- 
positions within one or two trials. 

To test the efficiency inherent in study periods of various lengths, 
small though complete compositions of five to eight measures serve 
adequately as samples qualitatively similar to longer material. 

Nine such samples of unfamiliar piano music were studied for 
three, six, and nine minutes before continuing the memorizing to 
perfect performance at the keyboard. “At the end of the study period, 
the material was transcribed from memory as a check on the stage 
of the learning. Two weeks later it was relearned to gauge the relative 
retention values of the several amounts of preliminary study. 

Nine adults, all competent musiciang and skillful pianists, each 
performed the experiment three times, so that twenty-seven learnings 
resulted from each method, or eighty-one learnings in all. The 
variables of subjects, compositions, and order of presentation were 
rotated so that they weighed equally in the three contrasted methods. 

The amount of material transcribed at the end of six minutes was 
reliably greater than at the end of three; because the music lay well 
within the capacity of the subjects the transcriptions at the end of nine 
minutes were not significantly better than at the end of six. In addi- 
tion, the differences in the scores were quantitative, not qualitative. 
At the end of three minutes roughly half the total number of measures 
was written correctly, either complete right and left hand units from 
the beginning, or the right hand alone with the left hand begun. 

Sixteen transcriptions of twenty-seven were perfect or within six 
errors after six minutes, eighteen after nine minutes; six learnings were 
completed at the keyboard within one or two trials after six minutes, 
nine after nine minutes. The small differences in both transcriptions 
and learning trials after six and nine minutes supports the inference 
that the additional three minutes of overlearning utilized in mental 
rehearsal were relatively ineffective. 

The number of keyboard trials required to reach the criterion of 
perfect memorized performance after three minutes of study was 
reliably reduced after six by 2.82 and after nine by 3.93. But the 


hon 


112 The Journal of Educational Psychology 


difference between six and nine minutes reduced the trials by only 1.11, 
which is not statistically significant. The reduction of 2.82 trials 
after six minutes becomes less impressive, however, when it is known 
that the compositions could be run on the keyboard roughly thirteen 
times in three minutes and twenty-six times in six minutes. 

The most serious inefficiency of the extended study period is its 
lack of effect on retention. No differences exist here. Neither the 
extra preparation to bring the material to completion in writing nor 
the added mental rehearsal after the material was learned shows any 
superiority over a thorough analysis and a readiness of some part less 
than the whole. 

It might then be recommended that piano students study the whole 
composition for details of structure and form, then study some unit 
of comfortable length sufficiently to attempt it from memory at the 
keyboard. The completion of such units and the subsequent welding 
into a whole will prove more efficient than attempts mentally to 
memorize and carry too large units. 


BIBLIOGRAPHY 


(1) Ebbinghaus, H.: “Ueber das Gedichtnis.”” Untersuchungen zur Experi- 
mentellen Psychologie. Duncker und Homblot, Leipzig, 1885. 

(2) Fisher, R. A.: Statistical Methods for Research Workers. Oliver and Boyd, 
Edinburgh, 1934. 

(3) Gates, A. I.: “Recitation as a Factor in Memory.” Archives of Psychology. 

Vol. v1, No. 40, p. 104. 

(4) Kovacs, S.: ‘Untersuchungen iiber das musikalische Gedichtnis.” Zeit. f. 
ang. Psych., Vol. x1, 1916, pp. 113-135. 

(5) Porteus, 8S. D.: The Maze Test and Mental Differences. Smith Publishing 
Co., Vineland, N. J., 1933. 

(6) Rubin-Rabson, G.: “The Influence of Analytical Pre-study in Memorizing 
Piano Music.” Archives of Psych., 1937, No. 220. 

(7) Rubin-Rabson, G.: “Studies in the Psychology of Memorizing Piano Music. 
I. A Comparison of the Unilateral and the Coordinated Approach.” 
J. Educ. Psych., Vol. xxx, 1939, pp. 321-345. 

(8) Rubin-Rabson, G.: ‘Studies in the Psychology of Memorizing Piano Music. 
II. A Comparison of Massed and Distributed Practice.” J. Educ. Psych., 
April, 1940. 

(9)} Rubin-Rabson, G.: “Studies in the Psychology of Memorizing Piano Music. 
III. A Comparison of the Whole and the Part Approach.” J. Educ. 
Psych., September, 1940. 

(10) Twitmyer, E. M.: “Visual Guidance in Motor Learning.” Amer. Jour. 
Psych., Vol. xii, 1931, pp. 165-187. 

(11) Witasek, S.: ‘‘ Ueber Lesen und Rezitieren in ihren Beziehungen zum Gedacht- 
nis.” Zeit. f. Psych., Vol. xutv, 1907, pp. 161-185, 246-282. 


| 


A PRELIMINARY REPORT ON THE DEVELOPMENT 
AND STANDARDIZATION OF A NON-VERBAL TEST 
AT THE HIGH-SCHOOL LEVEL* 


ANDREW W. BROWN AND ROBERT BLAKEY 
Institute for Juvenile Research, Chicago 
INTRODUCTION 


A survey of non-verbal group tests of intelligence in general use 
will reveal but few that are applicable to high-school students and 
college freshmen. ‘Those which can be used at these higher levels of 
ability are, as a whole, unsatisfactory, largely because they fail to 
tax the so-called higher mental processes. 

The main purpose of a non-verbal test is to give a measure of 
so-called general intelligence that is relatively free from the influence 
of a language deficiency whether the result of a foreign language 
environment, or a reading handicap, or some other factor. The 
possibility of predicting success in academic courses in high school by 
a non-verbal test is beside the point. In any case a non-verbal test 
should prove a valuable auxiliary measure to a verbal test in identify- 
ing those cases which may be retarded by a lack of language or verbal 
ability. Also, the greater the number of independent measures 
available, the more accurate the picture of the student will be, and, 
other things being equal, the more valuable will be a counciling program. 

Our purpose has been to construct and standardize a non-verbal 
test for use at the high-school level which will adequate’ represent 
the higher mental abilities such as inductive and deductive reasoning, 
perception, and the recognition of spatial relations, all of which empha- 
size the higher intellectual processes of analysis and integration. The 
lack of tests to measure these latter processes is the most serious 
deficiency in most non-verbal batteries which may be used at this level. 

It will be observed that both for the theory of construction as well 
as in the selection of the test material the authors are heartily indebted 
to Prof. L. L. Thurstone. 


* Studies from the Institute for Juvenile Research, Chicago, Paul L. Schroeder, 
M.D., Director. Series C, No. 311. 
The authors are indebted to W.P.A. for both clerical and technical assistance 


in this work. We are especially indebted to Mr. Joseph Forpanek for making the 
drawings. 


113 


114 The Journal of Educational Psychology 


A. THE TEST USED 


The experimental battery consisted of eleven tests. Some of these 
were modified forms of tests used by Thurstone in his investigations 
into the factorial components of intelligence. Other tests, six, ten, 
and eleven, were devised by one of the authors (Blakey). The tests 
were arranged so that instruction and fore-exercises were on one page 
and the test proper on another which the subject could not see without 
turning the page. This arrangement eliminated a manual of direc- 
tions for giving the examination and also prevented the subjects from 
getting started before the signal ‘‘go”’ was given. 

Eleven tests were used in the experimental battery, and, although 
it was found necessary to drop several of these from the final form, a 
description of all eleven will be given here. 

Test 1. Mantkin.—The “ Manikin”’ test is designed to test per- 
ceptual speed. There is a page of pied line-figure drawings of men, 
with variations in the positions of the arms and legs. The problem 
is to draw a ring about each manikin with its arms and legs in the same 
position as in a model at the top of the page. About thirty of the 
two hundred forty figures are like the model. 

Test 2. Identical Patterns.—The “Identical Patterns” test is also 
designed to measure perceptual speed or mental alertness. Each 
pattern is composed of four overlapping geometrical figures—two 
triangles and two circles. The size of the figures is the same in each 
pattern but there are twenty-four different arrangements of positions 
of the component triangles and circles and differences of dotted and 
solid line. The patterns are arranged in rows across the page. The 
problem is to mark the patterns in each row that are exactly like the 
first one in that row. The models are separated from the rest of 
the row by a vertical line. This test is a variation of the “Identical 
Forms” test used by Thurstone in his study of primary mental 
abilities. 

Test 3. Fitting Parts —The “Fitting Parts” test is a variation 
of the form-board test and is devised to test spatial relations ability. 
Instead of the usual single discrimination of shape this test has a 
double discrimination of shape and size. The three parts of a figure 
are given at the left. The problem is to indicate which of four adjacent 
figures could be cut to make the parts. The choices represent two 
different shapes and two different sizes of each shape for each set of 
stimuli. The test has twelve items. 


Bal 


A Non-verbal Test at the High-school Level 115 


Test 4. Opposite Sides—The “Opposite Sides” test is also 
designed to test spatial relations ability. Each item has three small 
pennants the shape of right triangles. Two of the pennants show the 
same side. The other pennant shows the opposite side. The problem 
is to mark the pennant which shows the opposite side from the other 
two. It is believed that inductive reasoning may also be present in 
this test. 

Test 5. Code.—The ‘‘Code” test is another test to measure 
perceptual speed. However, as we did not wish to make use of 
numbers in this test, it was necessary to have two sets of meaningless 
variables in the code. A duo-decimal system was used in the upper 
half of each code box and a variant of a Roman numeral system was 
used in the lower half. In this test, a code is given at the top of the 
page and five rows of the code boxes are presented underneath. Both 
halves of each box are filled in, but some of the pairs are made incor- 
rectly. The subject must mark each code box that does not agree 
with the model at the top of the page. There are forty-five items in 
the test. 

Test 6. Circle Grouping.—The “Circle Grouping” test is a new 
test of non-verbal reasoning of the inductive type. There are four 
squares in a row. Each square has a number of clusters of small 
circles in it. The arrangement of the clusters changes from square 
to square. One of the circles in each of the first three squares is black. 
The problem is to educe the rule by which the circles were blackened 
and blacken the corresponding circle in the fourth square. There are 
twelve items in the test. 

Test 7. Form Series —The “Form Series” test is of the con- 
ventional series type except that the figures are non-meaningful, only 
three figures are used, and the figures in all of the items are the same. 
The response is of the three alternative, multiple-choice type. The 
correct figure to go into a blank space in the series to be indicated. 
There are twenty-two items in the test. 

Test 8. Circle Reasoning.—The “Circle Reasoning” test is a 
variant of the Yerkes multiple-choice problem and an adaption and 
simplification of the “‘Marks”’ test used by Thurstone? in his study of 
primary mental abilities. The ability necessary for performance of 
this test is, according to Thurstone’s analysis, of the inductive reason- 
ing type. There are five rows of circles and dashes. One circle of 
each of the first four rows is black. The examinee finds the rule by 
which the circles in the first four rows were blackened and marks the 


116 The Journal of Educational Psychology 


circle which should be black in the bottom row. There are fifteen 
items in the test. 

Test 9. Form Relations.—The “Form Relations” test is a test 
of deductive reasoning. The test is a variation of the “‘ Analogies” 
test by Thurstone. Three stimulus figures are given and a five item 
multiple choice response. The problem is to indicate the figure that 
has the same relation to the third figure as the second figure has to the 
first. There are sixteen items in the test. 

Test 10. Form Reasoning.—The “‘Form Reasoning” test is a new 
test of deductive reasoning type. The problem is to combine forms 
according to rules given at the top of the page and to indicate which 
of five alternatives is the correct answer. There are twelve items in 
the test. 

Test 11. Form Syllogisms,—The ‘Form Syllogisms”’ test is 
another new test of deductive reasoning. Figures may be combined 
according to two rules. One figure of a pair connected by a dotted 
line may have a line drawn under it, in which case the figure with the 
line under it represents the larger amount. Or the two figures may be 
connected by the equal sign in which case both figures represent equal 
amounts. Two of the pairs of figures are given as “facts” and two 
other pairs are given as “conclusions.”” According to the facts, only 
one of the conclusions is correct, and the problem is to mark the correct 
conclusion. There are eighteen items in the test. 

Because of the difficulty of giving the instructions and because it 
took too long to give all eleven tests, Code, Form Series, and Form 
Syllogisms, were omitted from the final battery. The remaining eight 
subtests can be given in the usual forty-minute class period. They 
may be summarized as follows: Two tests of perceptual speed, two 
tests of spatial relations, four tests of abstract reasoning. 


B. ESTABLISHING TIME LIMITS 


Time limits on the test were established by giving the test to 
thirty-five of the highest ranking students in the senior class in one 
of the large Chicago high schools. Each test was administered to this 
group. As soon as a student had finished the test he held up his 
hand and the examiners (the authors) called off the time which he 
put on his test blank. The time limit set was such that only a few 
of this highly selected group were able to finish the test in the allotted 
time. The time limits thus established have proven to be quite 
satisfactory. 


| 


A Non-verbal Test ai the High-school Level 117 


C. THE PRELIMINARY TRY-OUT 


The group selected for the preliminary investigation of the tests 
consisted of two hundred eighty-six students—one hundred fifty-four 
boys and one hundred thirty-two girls in the high school at Hinsdale, 
Illinois. The age, grade and IQ distributions were as follows: 


Num- Grad Num- P Num- 
Age ber ¥ ber Otis IQ ber 
Below 15......... 43 | Freshman........ 96 EY 20 
15 to 15-11....... 100 | Sophomore....... 120 | 100-119.......... 139 
16 to 16-11....... 58 | 120 and up....... 69 
17 to 17-11....... 39 | Seniors........... 12 


* Some of the subjects had not been given the Otis test. 


D. ADMINISTERING THE TEST 


All of the tests for this preliminary investigation were administered 
by the authors. In general the testing conditions were satisfactory. 
The examiners read the instructions in the usual manner and gave 
ample time for the completion of the fore-exercise. The subjects were 
then told to turn the page, to read the abbreviated directions at the 
top of the page, and to begin. Time was accurately recorded with a 
stop watch. 


E, SCORING 


After careful consideration of the advantages and disadvantages 
of different scoring formulae for the different tests, it was decided to 
use scoring formulae only in tests 3, 4, and 7. No special weighting 
methods were found valuable. The variances of the different subtests 
were sufficiently close to make the effect of such methods negligible. 


F,. VALIDITY AND RELIABILITY 


The usual method of determining the validity of intelligence tests 
is to show how closely they correlate with other supposed tests of 
intelligence. This is a highly questionable procedure, and would be 
particularly so in the present case. This test is not designed to test 
the whole range of intellectual factors. Although a factorial analysis 
has not been done, it seems relatively clear from comparisons with 


118 The Journal of Educational Psychology 


other tests that memory, number ability, and verbal ability are not 
represented in this test. Comprehension, speed of perception, visuali- 
zation, and particularly inductive and deductive reasoning, we believe 
are represented. And, of course, only in so far as factors are common 
to any two tests will they correlate. Due to the fact that compre- 
hension, perception and reasoning are processes independent, to a 
large degree, of content, it would be expected that the Non-Verbal 
Test would correlate only to an intermediate degree with verbal intelli- 
gence tests. Also as high-school grades are dependent on memory, 
verbal, and number abilities as well as comprehension, reasoning, 
perception, and visualizing abilities, one would expect a higher corre- 
lation of a verbal test with school grades in proportion to the impor- 
tance of the three former factors. As has been mentioned before, 
however, this test or any other test should not be used alone to 
predict success in high school. 


TaBLE I.—INTERCORRELATIONS 


7] 8] 
2. Identical patterns............. 24|... 0517 240) 19,392 23 
4. Opposite sides................ .26 
6. Circle grouping............... . 19}. 46). 20) .25). 26). . .|.48].50}.53 
. 13). 16). 13) . 38) . 22). 48). . .|.35).52).54) .32 
8. Circle reasoning.............. 19). 15}. 10) . 32) . 25) . 50) .35). . .|.55).38) .31 
9. Form relations................ .33 30| 35) .39 
10. Form reasoning............... . 19}. 24} .35 
. 45). 54/38 60}.52 . 70} .60 


The basic theory underlying the test is that intellect is composed 
of several special abilities or factors* and the tests of this battery were 
set up with this concept in mind and were formulated a priori so as 
to contain the reasoning, perceptual, speed, and visualizing factors. 
In some cases, the tests used were variations of tests found to contain 
these factors in multiple factor analyses. In other cases, new tests 
were made to satisfy the definition of a process, such as induction or 


* One of the authors, Mr. Blakey, has worked for several years in Prof. Thur- 
stone’s Laboratory and is familiar with the concepts of factorial analysis. 


} 
| 


A Non-verbal Test at the High-school Level 119 


deduction. We feel that this constitutes one legitimate method of 
validation. 

In the case of the reasoning subtests, a further validation would be 
shown by relatively high intercorrelations among themselves. Table I 
gives the table of intercorrelations of the subtests and total score. 

The correlations were computed with the aid of tetrachoric com- 
puting diagrams. , A population of two hundred eighty-six subjects was 
used. The divisions for the four-fold tables were made near the means 
so that the proportions and hence the correlations should be relatively 
stable. 

From Table I it is evident that the reasoning tests 6, 8, 9, and 10, 
correlate more closely among themselves than with the perceptual and 
visualizing tests or than the latter with each other. This suggests a 
definite communality of factors among these tests, and as tests 6 and 10 
have never been included in a battery for factor analysis such a proof 
of communality is encouraging. 

The results of the test have also been correlated with school grades 
and IQ’s on the Otis Self-Administering tests. Both the grades and 
the Otis IQ’s were secured from the school files. 

In computing the average grade for each child a grade of “‘A”’ was 
given one point, a grade of ‘‘B” two points, a grade of “‘C” three 
points, etc. Each student was registered for four courses. For exam- 
ple, a student who had a C average would have a total of twelve points. 
Also if an Otis Self-Administering Test had been given, the IQ was 
tabulated. The correlation between school grades and the “ Non- 
Verbal Reasoning Test” was .47. The correlation between Otis IQ 
and school grades was .60. The correlation between ‘‘ Non-Verbal 
Reasoning Test” and Otis IQ was .59. 

The reliabilities of the different tests were determined by the odd- 
even method and corrected for double length with the Spearman-Brown 
formula. Tetrachoric correlations were used for this purpose and 
were computed with the aid of tetrachoric correlation diagrams. 
These reliability coefficients are presented in Table II. 

Although the tests were very short both in number of items and 
time for administration, it is evident from this table that, with the 
exception of test 3, the tests are fairly reliable and that the test as a 
whole has as high a reliability coefficient as most standardized tests 
in general use. The significance of this high reliability is somewhat 
enhanced by the fact that the children were all at the high-school level 
which would reduce the age range. There was a correlation of —.13 


120 The Journal of Educational Psychology 


between chronological age and total score of the test and there was no 
significant difference between the means of the four grades used. 
(See table III.) The standard error of the total raw score is 4.22, of 
the derived score, 2.55. 


G. NORMS 


Table II gives the means, the standard deviations and the reliabili- 
ties of the subtests, also the means for the total score, the Otis IQ, 
the grade score, and the chronological age for the total Hinsdale group. 


TaBLE IJ.—Megan Scores, STANDARD DEVIATIONS AND RELIABILITIES 
oF SUBTESTS 


Variable Mean | SD | Reliability 
10.8; 3.9 .94 


Grade means and standard deviation are given in Table III. 

It is evident that in this population there are no statistically 
significant differences in either mean or standard deviation between 
the various grades, nor are there any consistent trends of differences. 
The total group was, therefore, used in computing the norms for the 
derived scores and percentile ranks. 

In Table IV are the percentile ranks corresponding to the mid- 
points of the class intervals indicated. 

Derived scores (DS) are given in Table V. This set of tentative 
norms is applicable to persons fourteen years of age or older. Norms 
for younger age groups have not been established. The mean of the 
derived scores has been arbitrarily set at 100 and the standard devia- 


tion at 15. 


| | 


A Non-verbal Test at the High-school Level 121 


In interpreting these scores, one may compare them with IQ’s. 
The mean IQ is by definition 100 and various standard deviations have 
been reported on different tests and in different studies on the same 
test. Standard deviations of IQ on the Stanford-Binet and Revised 


Taste III.—Mgan Score ror Hiau-scnoot GRADES 


Variable Mean Mean | SD | N 
years 
15.2 95.1 | 25.4] 96 
17.2 95.3 | 21.8; 58 
es dec ces 18.3 101.7 | 23.6} 12 
ede 16.1 96.6 | 24.3 | 286 
1] 1.3 
TasLe Ranxs (Raw Scores) 
Class intervals F PR Class intervals F PR 
30-34 .9 1 .002 100-104.9 25 .57 
35-39 .9 1 01 105-109.9 22 65 
40-44.9 4 .02 110-114.9 26 .73 
45-49.9 3 .03 115-119.9 23 .82 
50-54.9 7 .04 120-124.9 12 .88 
55-59.9 6 .07 125-129.9 6 91 
60-64 .9 3 .08 130-134.9 7 ~ .94 
65-69 .9 15 ll 135-139 .9 5 . 96 
70-74 .9 13 16 140-144.9 3 .97 
75-79 .9 16 21 145-149.9 3 98 
80-84 .9 24 .28 150-154.9 1 .99 
85-89 .9 20 . 36 155-159.9 1 .99 
90-94 .9 21 160-164.9 2 .997 
95-99 .9 16 .50 165-169 .9 0 1.000 


Stanford-Binet tests have been reported ranging from twelve to sixteen 
points in various studies, but there seems to be a tendency for most 
of the standard deviations to be about fifteen points. If fifteen points 
were the true standard deviation of the Stanford-Binet tests, then 


122 The Journal of Educational Psychology 


according to the normal probability table, 15.9 per cent of the general 
population would have scores lower than 85 IQ and 2.3 per cent would 
have IQ’s lower than 70 on the Stanford-Binet test. In the same 
fashion, a DS of 85 on the Non-Verbal Reasoning Test would mean 
that 15.9 per cent of the experimental population made lower scores, 


TaBLE V.—DeEnrivep Scorzs (DS) 


Total DS Total DS Total DS Total DS Total 


score score score score score 


DS 


20 53 50 71 80 90 110 108 140 126 
21 53 51 72 81 90 111 109 141 127 
22 54 52 72 82 91 112 109 142 128 
23 55 53 73 83 91 113 110 143 128 
24 55 54 73 84 92 114 110 144 129 
25 56 55 74 85 93 115 111 145 130 
26 56 56 75 86 93 116 112 146 130 
27 57 57 75 87 94 117 112 147 131 
28 58 58 76 88 95 118 113 148 131 
29 58 59 76 89 95 119 114 149 132 
30 59 60 77 90 96 120 114 150 133 
31 59 61 78 91 96 121 115 151 133 
32 60 62 78 92 97 122 115 152 134 
33 61 63 79 93 98 123 116 153 135 
34 61 64 80 94 98 124 117 154 135 
35 62 65 80 95 99 125 117 155 136 
36 62 66 81 96 100 126 118 156 136 
37 63 67 81 97 100 127 119 157 137 
38 64 68 82 98 101 128 119 158 138 
39 64 69 83 99 101 129 120 159 138 
40 65 70 83 100 102 130 120 160 139 
41 65 71 84 101 103 131 121 161 140 
42 66 72 85 102 103 132 122 162 140 
43 67 73 85 103 104 133 122 163 141 
44 67 74 86 104 104 134 123 164 142 
45 68 75 86 105 105 135 123 165 142 
46 69 76 87 106 106 136 124 166 143 
47 69 77 88 107 106 137 125 167 143 
48 70 78 88 108 107 138 125 168 144 
49 70 79 89 109 108 139 126 169 145 


and a DS of 70 would mean that according to the normal probability 
table 2.3 per cent of the experimental population would be expected 


to make lower scores. 
Wechsler‘ in establishing norms for the Wechsler-Bellevue Intelli- 
gence Test, has followed a similar procedure. This test is set up so 


A Non-verbal Test at the High-school Level 123 


that fifty per cent of the experimental group has IQ’s between 90 
and 110. On the ‘‘Non-Verbal Reasoning” test similarly fifty per 
cent of the experimental population made DS’s between 90 and 110. 

Hence, although the MA concept is not introduced, a valid scale of 
performance with statistical properties quite similar to the IQ is pro- 
vided. The fallacies inherent in any formula involving the mental-age 
concept near the end of the learning curve are accordingly absent in 
the derived score. 

As the experimental population was a high-school group and was 
necessarily selective in that many of the poorer students never reach 
high school for one reason or another, or drop out before the end of 
the ninth grade, the interpretation placed on the DS should be in 
terms of a general high-school population and not of the general popu- 
lation. In so far as this is true, the ~~ as a rule be lower than 
an IQ. 


SUMMARY 


(1) A series of eleven subtests of a non-verbal nature involving 
perceptual speed, space relations, and inductive and deductive reason- 
ing has been constructed on the basis of the concepts of primary mental 
abilities. 

_ (2) This series of tests has been standardized on a group of two 
hundred eighty-six students of a surburban high school. 

(3) The tests are found to be reliable and valid for the purpose in 
mind. 

(4) Tentative norms, both percentile ranks and derived scores (DS) 
are given, the latter to take the place of IQ’s at this level. 

(5) It is recognized that the present norms are inadequate and 
standardization on a much larger sample is being undertaken. 


BIBLIOGRAPHY 


1. Thurstone, L. L.: Primary Mental Abilities. University of Chicago Press, 1938. 

2. Thurstone, L. L.: “The Perceptual Factor.” Psychometrika, Vol. 111, 1938. 

3. Thurstone, L. L.: Unpublished study of an experiment at Hyde Park High 
School. 

4. Chesire, Leone, Saffir, Milton, and Thurstone, L. L.: Computing Diagrams for 
the Tetrachoric Correlation Coefficient. 

5. Wechsler, David: The Measurement of Adult Intelligence. Williams and Wilkins 
Co., N.Y., 1939. 


A COMPARISON OF THE VOCATIONAL INTERESTS 
OF NEGRO AND WHITE HIGH-SCHOOL STUDENTS 


PAUL WITTY, SOL GARFIELD, AND WILLIAM G. BRINK 
Northwestern University 


A knowledge of the vocational interests of students may prove of 
great value to teachers in their efforts to understand and meet chil- 
dren’s needs. For it is generally conceded that the objectives of a 


. sound educational program should include an effective concern for 


children’s vocational interests as well as for their health, use of leisure, 
home and community relationships, and social competency in demo- 
cratic life. The former objective is of utmost significance since the 
‘work compulsion” occupies such an important réle in determining the 
satisfactions which any person may obtain from life. Moreover, 
present-day economic and social trends reveal new vocational oppor- 
tunities as well as alterations and disappearances of old patterns. The 
increase in unemployment during the past two decades presents a 
problem which involves new interpretations of vocational opportu- 
nities. Certainly, serious concern for the guidance of youth in this 
important area can not proceed without knowledge of the hopes and 
aspirations of youth, however transitory, mistaken, or false they may 
prove to be. 

Too often the student’s vocational interests are unrealistic; they 
are arrived at by the pupil without adequate understanding of himself 
or of society. ‘‘When students who have neither studied occupations 
systematically nor thought very seriously about their own aptitudes 
are asked to report their occupational choices, their expressions of 
preference bear only a scant resemblance to the fields of employment 
they ultimately enter.’ Accordingly, young people frequently 
express a desire to prepare for occupations in which most of them can 
not possibly engage. For example, in a study of the vocational choices 
and opportunities of one thousand boys in a New York City junior 
high school great disparities were found: One hundred five boys out of 
one thousand wanted to be physicians when the demand in that locale 
was five per one thousand; sixty-four wanted to be lawyers, although 
the current rates were eight per one thousand population.? Similar 
results were reported by Bedford’ in a study of twelve hundred eleven 
high-school students in twelve California schools. In this investiga- 
tion, forty-two and four tenths per cent of the students chose to enter 

124 


Vocational Interests of Negro and White Students 125 


the professions, although the census of 1930 indicates that there were 
only six and seven-tenths per cent of the population thus employed. 
In another study involving twenty-two hundred seventy-four high- 
school seniors in Wyoming, Kilzer* reported that only twenty-two and 
nine-tenths per cent selected the professions, and fifty-two and eight- 
tenths per cent were undecided as to vocational choice. But the 
choices were unrealistic in this case also since agriculture, forestry, and 
mining, which employ large numbers of people in Wyoming, were cited 
by a very small percentage of the pupils. 

It appears that occupational choices are no more realistically made 
today than they were a decade ago. At that time, Lehman and 
Witty® calculated the coefficient of correlation between the number 
of employed workers and the number of aspirants in twenty-four 
occupations; most of the coefficients of correlation were of significant 
size, but all were negative. The coefficients for the higher chronolog- 
ical age levels appear particularly significant. They reflect again little 
change in efficiency of vocational choice (in terms of demand) with 
advance in chronological age. 

Similar results have been reported for Negro children. Hyte* men- 
tions two unpublished studies: Mebane found that the professions were 
the most popular choice in thirty-eight Negro schools in North Carolina; 
and Parks found that fifty-two per cent of the choices of seven hundred 
twenty-four high-school Negro students included vocations of a profes- 
sional nature. Hyte obtained corroborative results. Among eight 
hundred seventy Negro boys in twelve high schools in Indiana and 
Kentucky he discovered that over forty-four per cent of the choices 
were concentrated in two occupations—teaching and medicine. The 
parents of these boys made up only about six per cent of the persons 
engaged in these types of work. The distorted perspective of these 
pupils was further revealed by the fact that nearly seventy per cent 
of the boys indicated that they intended to go to college. When 
allowance is made for all the factors that may have influenced these 
reports, it is still abundantly clear that these expressions of preference 
and ambition reveal little more than the illusory hopes of uninformed 
youth. This situation is particularly unfortunate in the case of 
Negroes, since the toll of unemployment is especially large among the 
members of this group, and expectations are unlikely to be realized 
except in rare cases. It may prove of interest to examine this situation 
further, ascertaining the vocational choices of Negro youth in urban 
centers, and the extent to which the school has undertaken to assist 


126 The Journal of Educational Psychology 


children in making intelligent discriminations in selecting their life 
work. 


THE PRESENT STUDY 


The present study has two purposes: (1) To compare the vocational 
interests of white and Negro children in urban centers and (2) to 
attempt to estimate the suitability or desirability of the children’s 
preferences. A total of sixteen hundred eighty-four high-school 
students, representing each of the four grades, were studied—seven 
hundred from a high school enrolling white children exclusively, and 
nine hundred eighty-four from a Negro high school. An inventory of 
interests and activities was distributed to these students and filled out 
in the presence of teachers or research workers.* A section of the 
inventory dealt with vocational interests, and contained the following 
questions: Have you decided as yet what occupation you would like 
to enter when you complete your education? Definitely? Uncer- 
tain? List in order of your interest occupations you would like to 
enter. Has any one assisted you in deciding upon a vocation? Who? 
Do you expect to finish high school? Do you intend to go to college? 
Do you want to go to college? 


RESULTS 


Tables I and II indicate the occupational preferences of the boys 
and the girls in the two schools. The students were allowed to list 
more than one choice; therefore, there are more choices than numbers 
of students. 


BOYS’ PREFERENCES 


In several ways the data corroborate the results of previous studies. 
In certain instances, however, there are unique differences. In the 
white high school, more than twenty-five per cent of the boys expressed 
an interest in engineering; this was closely followed by aviation 
(twenty-one per cent). Next in order were machinist (eleven per 
cent), forestry (seven and seven-tenths per cent) and office work 
(seven and one-tenth per cent). Medicine appealed to only four and 
seven-tenths per cent of the pupils. In the Negro school, postal work 
was the most popular choice (twenty-one and nine-tenths per cent); 


*The writers wish to acknowledge the work of Mr. Donald Dietrich who 
supervised the administering of the inventory in the two high schools in the 
metropolitan area of Chicago. 


Vocational Interests of Negro and White Students 127 


it was followed by music, other than teaching, (fifteen per cent), 
medicine (thirteen and three-tenths per cent), law (twelve per cent), 
engineering (ten and eight-tenths per cent), aviation (nine and eight- 
tenths per cent) and teaching (nine and one-tenth per cent). 


TaBLe I.—OccupaTIONAL Cuoices or Boys 


Percentage 
Office 7.1 1.4 
Carpentry-woodwork-cabinet making...................... 4.4 2 
Civil—government 1.1 8.3 
5.1 


It is interesting to note some differences between the two groups of 
boys. Postal work which included twenty-one and nine-tenths per 
cent of the choices in the colored school was mentioned by only one and 
one-tenth per cent of the white boys. Only two per cent of the white 
boys were interested in music, while fifteen per cent of the Negro boys 


128 The Journal of Educational Psychology 


indicated a preference for it. Medicine, law, and teaching were more 
popular in the Negro school than in the white school. Although 
engineering and aviation were popular in both schools, they were 
twice as appealing in the white school as in the Negro school. 

The percentages of students interested in postal work and music 
is of special interest since these occupations have recently employed 
rather large numbers of Negro youth. A comparison of the choices in 
agriculture, forestry and civil service is also pertinent. While a total 
of thirteen and six-tenths per cent in the white high school were 
interested in the first two occupations, less than one per cent of the 
colored children expressed this interest. The Negro group, however, 
expressed greater interest in government and civil service employment. 
(Eight and three-tenths per cent to one and one-tenth per cent.) 

The Negro pupils differed in some ways from the group studied by 
Hyte.* While twenty-three and eight-tenths per cent of his group 

\ chose teaching, only nine and one-tenth per cent of this group did; 
music was chosen by fifteen per cent of this group and by two per cent 
in Hyte’s investigation; twenty and five-tenths per cent of his group 
chose medicine, as compared with only thirteen and three-tenths per 
cent of the present group. Aviation was cited by three and one-tenth 
per cent of his group, and nine and eight-tenths per cent in this study. 


PREFERENCES OF GIRLS 


Three occupational groups included most of the choices of white 
and Negro girls. These occupations were clerical (including steno- 
graphic and office work), teaching, and nursing. The popularity of 
these occupational fields can be attested to by the fact that ninety- 
three per cent of the white and ninety-nine per cent of the Negro girls 
listed one of these three as a preferred vocation. In addition to the 
above, the white girls were interested in journalism, modeling, avia- 
tion—air stewardess, designing and art. Occupations in which the 
Negro girls expressed a greater interest included social work and sew- 
ing (dressmaking). 

In the other occupations the vocational choices of the two groups 
were strikingly similar. Beautician, for example, claimed eleven and 
seven-tenths per cent of the white and eleven and five-tenths per cent 
of the Negro citations. 

Forty-eight per cent of the white and fifty-eight per cent of the 
Negro pupils stated that they had made definite vocational choices; 
uncertainty was indicated by forty-seven per cent of the white and 


ott 


Vocational Interests of Negro and White Students 129 


thirty-three per cent of the Negro students. Less than one-half of the 
pupils in each group had received assistance in making their choices. ~ 
Both groups reported that parents provided the greatest amount of 
help. Twenty-six per cent of these white students mentioned help 
from the school alone and twenty per cent from school and parents. 


II.—OccupaTIONAL CuHoices or GIRLS 


Percentage 

10.9 .6 


In the Negro school only five per cent stated that assistance came from 
the school alone, and one and six-tenths per cent from school and 
parents. Thus the school appears to play a secondary réle in these 
choices, especially in the Negro school where its influence is almost 
negligible. 


EDUCATIONAL AMBITIONS 


Almost all students in both schools stated that they planned to 
finish high school (ninety-eight per cent in the white and ninety-nine 


130 The Journal of Educational Psychology 


per cent in the Negro school). In addition, forty-four per cent of the 
’ white and sixty-five per cent of the Negro pupils expected to go to 
college. These results correspond rather closely with Kilzer’s data.‘ 
Kilzer reported that fifty-four and seven-tenths per cent of twenty-two 
hundred seventy-four high-school seniors in Wyoming indicated in the 
Spring of the year of graduation that they intended to go to college; of 
these, thirty-six and five-tenths per cent actually entered. Although 
the present study includes samplings from each of the four grades in 
the high school, the percentages do not vary greatly from grade to 
grade.“ It is interesting to note that more students expect to go to 
college in the Negro high school than in the white, although the former 
come from a somewhat poorer socio-economic environment. 


IMPLICATIONS 


It seems apparent that the vocational ambitions of boys and girls 
in our secondary schools are to a large degree merely the expression 
of illusory aspirations. This situation is not confined to any grade, 
nor is it improved during the four years of the secondary school. In 
fact, there appears to be little advance toward a realistic and intelli- 
gent evaluation of the world of occupations as children mature. The 
choices of twelfth-grade pupils differ little from those of the ninth 
grade. Moreover, the children’s own testimony discloses the negli- 
gence of the school in providing appropriate guidance based upon 
increasingly relevant experience and information. Accordingly, few 
pupils in any grade believe that they have received help from the school 
in their problems of vocational choice. When help is received, it is 
obtained from the parents who, in most instances, are doubtless unpre- 
pared to offer valuable information or guidance. The extreme preva- 
lence of uncertainty, indecision, and anxiety about occupations and 
employment is in part the product of the school’s failure to recognize 
and assume its responsibility in the area of vocations. Despite the 
widespread emphasis upon vocational competency and understanding 
as an educational objective, it appears from this study that the voca- 
tional choices of secondary-school children are as unreal and unwhole- 
some as they were a decade or two ago. The results of such conditions 
are far-reaching: They necessitate adjustments that are personally 
unhygienic and socially undesirable. 

Lawrence K. Frank points clearly to the danger of such unjusti- 
fiable expectancies. ‘‘ . . . we must acknowledge that most of the 
contemporary careers we urge on youth are in truth but defenses 


> 


Vocational Interests of Negro and White Students 131 


against anxiety and emotional defeat—competitive struggles for power, 
prestige, or property that reflect the childhood insecurities from which 
the individual is fleeing and that threaten him with new insecurities 
from the other aggressive individual he must challenge. Such designs 
for living are neither mentally hygienic not socially desirable, but at 
present they are the only socially approved uses to which youth is 
asked to dedicate his life. . . . If we are to be sincere, we can but 
point out the futility of the competitive struggle that leads to no 
personal fulfillment because it arises from inner personal distortion and 
insecurity which no amount of achievement, property, or prestige can 
assuage. In contrast, we can try to give youth an understanding of 
how his or her own personal life can be made significant and enriched 
not merely by achievement or acquisition, but by the quality of human 
relations he or she can sustain.’ 

The situation which has been described can not be improved 
greatly unless children are led to understand themselves and their own 
motives; moreover, they must understand the motives of a people who 
have made work a psychological imperative and have endowed certain 
occupations with the highest sanction. In addition, there appears to 
be a great need for a more honest approach in education in which 
children’s choices of occupation will be made and evaluated in terms of 
an understanding of the nature and cause of occupational demand and 
limitation. These objectives can not be realized unless teachers 
become seriously concerned about the vocations and occupational 
choices as problems involving and affecting the individual, the com- 
munity, and society. 

It will be noted that the vocational choices of these young people 
contain few items which disclose interests in spontaneous or creative 
pursuits. Moreover, the seeming futility of our educational system is 
reflected in the concentration of interest in a small number of highly 
competitive occupations. From every analysis, there emerges the 
need for a more comprehensive and realistic treatment of vocations 
and occupations in the secondary school. It becomes clear also that 
schools must aim to alter the conditions which lead to an unrealistic 
understanding of society, including the world of vocations. They 
should attempt to lead children to discover the manifold ways in which 
personal life can be rich in human relationships and hence satisfying 
and productive. In such a design for coéperative living, vocational 
information, understanding, and experience will be properly oriented 
and, accordingly, will enrich and enhance human values. 


: / , 
‘ 
j 
= 


132 The Journal of Educational Psychology 


BIBLIOGRAPHY 


1. Bingham, Walter: Aptitudes and Aptitude Testing. Harper and Brothers, 
New York, 1937, p. 106. 

2. Op. cit., p. 108. 

3. Bedford, James H.: Youth and the World’s Work. Society for Occupational 
Research, Ltd., University of Southern California Station, Los Angeles, 1938, 
p. 16. 

4. Kilzer, L.. R.: ‘‘ Vocational Choices of High-school Seniors.’”’ Educational 
Administration and Supervision, Vol. xx1, November, 1935, pp. 576-581. 

5. Lehman, Harvey C. and Witty, Paul A.: “Vocational Guidance: Some Basic 
Considerations.””’ Journal of Educational Sociology, Vol. v111, November, 
1934, pp. 174-184. 

6. Hyte, Charles: “Occupational Interests of Negro High-school Boys.’’ School 
Review, Vol. xitv, January, 1936, pp. 34—49. 

7. Frank, Lawrence K.: “The Reorientation of Education to the Promotion of 
Mental Hygiene.” Mental Hygiene, Vol. xxu1, October 1939, p. 540. 


. 


A COMPARISON OF TEST RECORDS AND CLINICAL 
EVALUATIONS OF PERSONALITY ADJUSTMENT 


D. D. FEDER 
University of Illinois 
AND 
L. OPAL BAER 


The attempt to devise a convenient and simple measurement of 
personality adjustment comparable in validity to measurements of 
intelligence and achievement has resulted in the development of a 
number of questionnaire scales. However, almost no effort has been 
made to determine the validity of such instruments by comparing 
their diagnoses with clinical or observational evaluations. Most 
authors have been concerned chiefly with statistical analyses of the 
discriminatory value of items and scales despite the fact that the ulti- 
mate criterion for such scales is the behavioral adjustment of the 
individual. 7 

The present study had two main purposes: (1) To determine the 
effect upon responses when the Bernreuter Personality Inventory was 
taken (a) with normal directions, considering all items in terms of 
their occurrence within the total life span of the individual; (b) one 
month later, with experimental directions instructing the subjects 
to consider all questions with reference to whether or not the incident 
or conditions concerned had occurred within the past year; (2) The 
validity of the Bernreuter Inventory was studied by comparing scores 
obtained on it with evaluations of the behavior records of each subject. 
This was done for scores obtained under both the regular.and the 
experimental directions. 

The subjects were tested individually under most favorable con- 
ditions of rapport, making it possible to take cognizance of both 
psychological and physical factors. Eighty-one University of Iowa 


1 Lorge, Irving: “ Personality Traits by Fiat: I. The Analysis of the Total Trait 
Scores and Keys of the Bernreuter Personality Inventory.” J. Educ. Psychoi., 
Vol. xxv1, 1935, pp. 273-278. 

Lorge, I., Bernholz, E., and Sells, 8. B.: “‘ Personality Traits by Fiat: II. The 
Consistency of the Bernreuter Personality Inventory by the Bernreuter and the 
Flanagan Keys.” J. Educ. Psychol., Vol. xxv1, 1935, pp. 427-434. 

The study by Darley, ‘‘Test Maladjustment Related to Clinically Diagnosed 
Maladjustment,” J. Appl. Psychol., Vol. xx1, Dec., 1937, pp. 632-642, seems to be 
one of the first studies in this highly important area of validation. 

, 133 


134 The Journal of Educational Psychology 


undergraduate girls residing in a semi-coéperative dormitory, and one 
girl living outside the dormitory, were observed during the course of 
the year, and their behavior recorded upon standardized personnel 
records of the Dean of Women’s office. The tester was able to 
achieve intimate knowledge of each girl and her personal problems 
through daily contacts and interviews, normally initiated by the girls 
themselves. 

In addition to the Bernreuter Personality Inventory, the American 
Council on Education Psychological Examination was administered 
to all but six of the girls in order to secure information concerning 
intelligence. Scholastic records of each girl were obtained from the 
Registrar’s office. 


TaBLE I.—EVALUATION OF MEAN DIFFERENCES OBTAINED BY REGULAR AND 
EXPERIMENTAL DIRECTIONS 


B1-N B2-8 B3-I B4-D F1-C F2-8 


AM oe | AMi AM AM |AM|\ AM 


Regular directions............|—22.82 82.001 87|50.83|— 8.57|48.22| 25.56/60. 14/23. 23/88.15|— 5.76|54.11 

Experimental directions....... 38.28|65.90| 7.28/94. 13] —15.42/58.47 

Difference 4.28]..... 7.91)..... —10 72| 15.95)..... 9.66)..... 

N = 791 


1 Number for whom complete data were available. 
2 The formula used was that for the difference between means of correlated measures: 


O(2-y) = Voz? + oy? — 2reyorty 


Comparison of the means and standard deviations (Table I) of 
the subjects on the two testings reveals a general tendency toward 
improvement in personality adjustment under the conditions of the 
second testing. When the subjects answered the questions in terms 
of their experiences of the last year, the obtained scores showed them 
to be less neurotic, less inclined toward self-sufficiency and more 
gregarious, less introverted, more constructively dominant in social 
situations, more wholesomely self-confident and more sociable than 
they were when they responded in terms of their entire life span. The 
absence of perfect positive correlation between the two testings sug- 
gests that there was both positive and negative displacement of 


1 The tester, Miss Baer, served as social and personnel director of the dormitory. 


| 

| 

| 


Compartson of Test Records and Clinical Evaluations 135 


scores. Only one of the mean differences, that of the dominance scale, 
is statistically significant, the critical ratio being 4.5. Although there 
were no great changes in the group’s variability under the experimental 
conditions, all except one standard deviation increased, indicating a 
wider spread of scores when the subjects reacted in terms of more 
recent experiences. 

A detailed analysis was made of the frequency of changed responses 
for each item under the varied directions. Change in answer by ten 
per cent or more of the group was arbitrarily taken as the criterion 
for the selection of the items which follow. The following items 
showed greater proportion of change from “‘ Yes” to ‘‘No”’ under the 
modified directions. 


Does it make you uncomfortable to be ‘‘different’”’ or unconventional? 

Have you ever crossed the street to avoid meeting some person? 

Do you ever give money to beggars? 

Do you often feel just miserable? 

Are you touchy on various subjects? 

Do you like to bear responsibilities alone? 

Do you usually prefer to do your own planning alone rather than with 
others? \ 

Are you easily moved to tears? 

Do you find it difficult to speak in public? 

If you are spending an evening in the company of other people do you 
usually let someone else decide upon the entertainment? 

Does your mind often wander so badly that you lose track of what you 
are doing? 

Do you ever argue a point with an older person whom you respect? 

Do you prefer to be alone at times of emotional stress? 

Do you usually prefer to keep your feelings to yourself? 

Do you usually face your troubles alone without seeking help? 

Are you often in a state of excitement? 


The following items showed greater proportion of change from 
“No” to “‘ Yes” under the modified directions: 


Do you try to get your own way even if you have to fight for it? 

Do you blush very often? 

Do athletics interest you more than intellectual affairs? 

Have you ever tried to argue or bluff your way past a guard or doorman? 
Do you usually try to avoid dictatorial or “‘bossy’”’ people? 


1 These questions are reproduced with the permission of the Stanford Univer- 
sity Press. 


136 The Journal of Educational Psychology 


Do you find conversation more helpful in formulating your ideas than 
reading? 

Do you want someone to be with you when you receive bad news? 

Does it bother you to have people watch you at work even when you do 
it well? 

Do you make new friends easily? 

Do you especially like to have attention from acquaintances when you 
are ill? 

Are you able to play your best in a game or contest against an opponent 
who is greatly superior to you? 

When you are in low spirits do you try to find someone to cheer youup? 

Does your ambition need occasional stimulation through contact with 
successful people? 


On the following items, where considerable change was noted, the 
numbers of changes were almost identical: 


Can you stand criticism without feeling hurt? 

Are you much affected by the praise or blame of many people? 

Are you slow in making decisions? 

Do you very much mind taking back articles you have purchased at stores? 

Would you rather work for yourself than carry out the program of a 
superior whom you respect? 

Do you worry too long over humiliating experiences? 

Have you ever organized any clubs, teams, or other groups on your own 
initiative? 

Do you ever rewrite your letters before mailing them? 

Do you usually enjoy spending an evening alone? 

If you are dining out do you prefer to have someone else order dinner for 
you? 

Do you usually feel a great deal of hesitancy over borrowing an article from 
an acquaintance? 

Do you often find that you cannot make up your mind until the time for 
action has passed? 

Do you experience many pleasant or unpleasant moods? 

Does some particularly useless thought keep coming into your mind to 
bother you? 

Are you willing to take a chance alone in a situation of doubtful outcome? 

Do you try to treat a domineering person the same as he treats you? 

Do you have difficulty in making up your mind for yourself? 

At a reception or tea do you feel reluctant to meet the most important 
person present? 

Do you usually prefer to work with others? 

Do you worry over possible misfortunes? 


Comparison of Test Records and Clinical Evaluations 137 


Can you stick to a tiresome task for a long time without someone prodding 
or encouraging you? 

Do you get as many ideas at the time of reading a book as you do from a 
discussion of it afterward? 

Do you prefer making hurried decisions alone? 

Do you usually try to take added responsibilities on yourself? 

Do you greatly dislike being told how you should do things? 


There were relatively few changes in the direction of uncertainty, 
as indicated by the question mark answer. The changes away from 
uncertainty did not occur with noticeable frequency. 


TaBLeE II].—ExtTEenT oF AGREEMENT BETWEEN VERY EXTREME MALADJUSTMENT 
PLACEMENTS ON THE BERNREUTER PERSONALITY INVENTORY AND CLINICAL 


OBSERVATIONS 
Scale 
B1-N | B3-I | F1-C 

1. Number cases rated maladjusted—first testing......... 3 4 3 
Number cases in 1 having positive clinical records...... 0 2 3 
Per cent of agreement with positive clinical records.... . 0 50 | 100 
2. Number cases rated maladjusted—second testing....... 5 1 3 
Number cases in 2 having positive clinical records...... 1 0 2 
Per cent of agreement with positive clinical records.....| 20 0 '| 67 

3. Number cases rated maladjusted—first testing plus 
second testing (without overlap).................... 5 4 4 
Number cases in 3 having positive clinical records... .. . 1 2 3 
Per cent of agreement with positive clinical records.....| 20 50 75 
4. Number cases maladjusted on both testings............ 3 1 2 
Number cases in 4 having positive clinical records..... . 0 0 2 
Per cent of agreement with positive clinical records.... . 0 0~ | 100 


In general, it will be noted that when the subjects were instructed 
to respond in terms of the immediately preceding year’s experiences, 
their responses indicate greater socialization and more mature per- 
sonality adjustment. There appeared to be no adverse changes of 
serious magnitude or frequency directly traceable to the year’s experi- 
ence in the University. 

In order to determine the validity of the personality test scores 
as compared with actually observed behavior, a detailed report of the 
behavior patterns of each subject was prepared using many specific 
incidents collected during the year. These descriptions, with all 
identifying information omitted, were studied independently by the 


138 The Journal of Educational Psychology 


authors and classified by quartiles as to the nature and quality of 
their adjustments. There was slightly better than ninety per cent 
agreement in the two classifications. The pattern descriptions fur- 
nished by Bernreuter and Flanagan in the manual of the Inventory 
were used as the basis for classification. 


TaBLeE II].—ExtTEent oF AGREEMENT BETWEEN EXTREME DEcCILE MALADJUSTMENT 
PLACEMENTS ON THE BERNREUTER PERSONALITY INVENTORY AND CLINICAL 
OBSERVATIONS 


Bi-N | B2-S | B3-I | B4-D| F1-C | F2-S 


1. Number cases in maladjusted 


decile—first testing......... On 10 10 7 14 20 
Number cases in 1 having positive 
6 3 3 2 10 11 
Per cent of agreement with posi- 
tive clinical records............ 38 30 30 29 71 55 
2. Number cases in maladjusted 
decile—second testing......... 11 12 7 9 11 15 
Number cases in 2 having positive 
clinical records................ 1 2 3 2 i) 9 
Per cent of agreement with posi- 
tive clinical records............ i) 17 43 22 82 60 


3. Number cases in maladjusted 
decile—first testing plus second 


testing (without overlap)....... 18 16 12 11 19 23 
Number cases in 3 with positive 
clinical records................ 6 3 3 2 14 12 
Per cent of agreement with posi- 
tive clinical records............ 33 19 25 18 74 52 
4. Number cases in maladjusted 
decile on both testings......... 9 6 5 5 6 12 
Number cases in 4 with positive 
Clinical records................ 1 2 3 2 5 8 
Per cent of agreement with posi- 
tive clinical records............ 11 33 60 40 83 67 


Table II affords a comparison between the Personality Inventory 
scores and the case history records of those subjects who in terms of 
the Inventory norms would have been judged extremely maladjusted. 
The table may be read as follows: On the first testing, three cases were 
rated extremely neurotic and in need of immediate psychiatric atten- 
tion in terms of the published norms; none of these three had a clinical 
record of serious maladjustment; therefore the per cent of agreement 


Comparison of Test Records and Clinical Evaluations 139 


between the two records is zero. The rest of the data may be read in 
a similar manner. 

On the whole, there is a marked lack of agreement between the 
maladjustment diagnoses of the B1-N and B3-I scales and the clinical 
diagnoses. In fact, in only a very few of these cases were the observed 
behavior problems of such magnitude as to be seriously troublesome, 
suggesting that thé published norms do not yield diagnoses closely 
comparable with those secured by the observation and analysis of 
behavior. However, there is a high degree of correspondence between 
the F1-C diagnoses and the case history records. Extreme critical 
points are not given for the other three scales. 

In the expectation that the Inventory scores might have greater 
validity when taken over a wider range, the cases falling in the extreme 
decile of maladjustment were taken for comparison with their behavior 
records as indicated in Table III. In general the results parallel those 
obtained when only the cases of very extreme maladjustment were 
considered. 

Although the experimental directions caused some changes in per 
cent of agreement between Inventory scores and case records, the 
changes were not great and not consistent for all the scales. 

Fully as important as the function of recognizing the maladjusted 
individual is that of discovering the well-adjusted one, both for the 
clinical purpose of securing descriptions of adequate behavior adjust- 
ment, and, from the standpoint of the individual, for the purpose of 
maintaining and augmenting satisfactory adjustments. The manual 
of the Bernreuter Personality Inventory does not furnish norms 
indicative of good adjustment. Therefore, for the purposes of this 
study, the decile of ‘“‘best’’ scores was taken as indicative~of the 
desirable extreme. This time the good adjustment scores were com- 
pared with cases having negative (well-adjusted) clinical records. 
These data are summarized in Table IV and may be read exactly as 
the preceding table. Thus, of the six cases rated at the desirable 
extreme on the B1-N scale, four displayed no serious behavior malad- 
justments. However, two of the six were sufficiently maladjusted so 
that they could not be considered representative of the most desirable 
extreme of behavior. The other columns in this table may be read in 
similar fashion. 

The data indicate that the Inventory is on the whole considerably 
more dependable for the discovery of well-adjusted students than for 
those who are maladjusted. Again, however, there appears to be no 


140 The Journal of Educational Psychology 


marked advantage in favor of either the normal or experimental 
directions. 

Comparison of behavior records of cases whose scores placed them 
at the extremes of good and bad adjustment revealed numerous 
differences between the clinical findings and the trait descriptions of 
expected behavior given in the Inventory Manual. These divergences 
may be summarized as shown by Table IV. 


TaBLE [V.—EXTENT OF AGREEMENT BETWEEN Goop ADJUSTMENT PLACEMENTS 
ON THE BERNREUTER PERSONALITY INVENTORY AND CLINICAL OBSERVATIONS 


B1-N | B2-S | B3-I | B4-D | Fi-C | F2-S 


1. Number cases rated in well- 


adjusted decile—test I......... 6 7 8 10 8 2 
Number cases in 1 having negative 
clinical records................ 4 5 2 4 6 1 
Per cent of agreement with nega- 
tive clinical records............ 67 71 25 40 75 50 
2. Number cases rated in well- 
adjusted decile—test II........ 6 9 11 10 11 2 
Number cases in 2 having negative 
4 8 2 3 8 0 
Per cent of agreement with nega- 
tive clinical records............ 67 89 18 30 73 0 


3. Number cases rated in  well- 
adjusted decile—test I and test 


7 12 11 13 11 3 
Number cases in 3 having negative 

clinical records.............¢: 5 10 2 4 8 1 
Per cent of agreement with nega- 

tive clinical records............ 71 83 18 31 73 33 


4. Number cases rated in well- 
adjusted decile on both testings.| 5 4 8 7 8 1 
Number cases in 4 having negative 


3 3 2 3 6 0 
Per cent of agreement with nega- 
tive clinical records............ 60 75 25 43 75 0 
MANUAL DEscRIPTIONS CLINICAL OBSERVATIONS 


“Bi-N. A measure of neurotic tend- Emotional imbalance was feienenlly 
ency. Persons scoring high on this a very specific response to peculiar 
scale tend to be emotionally unstable. conditions. Individuals exhibited 
Those scoring above the 98th per- imbalance under certain conditions, 
centile would probably benefit from and were completely stable under 
psychiatric or medical advice. Those other conditions. The total score 


Comparison of Test Records and Clinical Evaluations 141 


Manvat Descriprions 


scoring low tend to be very well bal- 
anced emotionally.” 


“ B2-S. A measure of self-sufficiency. 
Persons scoring high on this scale 
prefer to be alone, rarely ask for sym- 
pathy or encouragement, and tend to 
ignore the advice of others. Those 
scoring low dislike solitude and often 
seek advice and encouragement.” 


“ B3-I. A measure of introversion- 
extroversion. Persons scoring high 
on this scale tend to be introverted; 
that is, they are imaginative and tend 
to live within themselves. Those 
scoring low are extroverted; that is, 
they rarely worry, seldom suffer emo- 
tional upsets, and rarely substitute 
day dreaming for action.” 


“B4-D. A measure of dominance- 
submission. Persons scoring high on 
this scale tend to dominate others in 
face-to-face situations. Those scor- 
ing low tend to be submissive.” 


OBSERVATIONS 


was not sufficiently diagnostic of these 
differences. A girl with a family his- 
tory of insanity, and neurotic tend- 
encies manifested in petty thieving 
was one of five serious problem cases 
who achieved ratings of satisfactory 
adjustment. 


Cases who asked for advice and sym- 
pathy did not necessarily dislike soli- 
tude. Often their preference for 
solitude was a manifestation of their 
maladjustment. By actual count of 
individuals and occasions, it was 
found that individuals scoring high 
frequently asked for advice and 
followed it. 


Persons scoring high on the tests did 
not show specific and positive clinical 
records of imaginative tendencies as 
frequently as other cases which scored 
close to the median. Individuals 
scoring extremely low on the scale 
were as apt to worry and have emo- 
tional upsets as many others scoring 
at and above the median. 


By actual count of situations persons 
scoring high on this scale were not any 
more given to dominating others than 
many who scored at or below the 
median. Often those scoring high 
were observed to be more subtle in 
their domination than those who 
obtained lower scores. Two cases 
dropped from the 99th percentile on 
the first testing to the lst percentile 
on the second testing; nothing in the 
case histories warranted the expecta- 
tion of such marked changes. Several 
cases of maladjustment arising from 
obnoxious attempts at social dom- 


142 


Manvat Descriprions 


“‘F1-C. A measure of confidence in 
oneself. Persons scoring high on this 
scale tend to be hamperingly self-con- 
scious and to have feelings of inferior- 
ity; those scoring above the 98th 
percentile would probably benefit 
from psychiatric or medical advice. 
Those scoring low tend to be whole- 
somely self-confident and to be very 
well adjusted to their environment.” 


““F2-S. A measure of sociability. 
Persons scoring high on this scale tend 
to be non-social, solitary or independ- 
ent. Those scoring low tend to be 
sociable and gregarious.” 


The Journal of Educational Psychology 


CLINICAL OBSERVATIONS 
ination rated very low. Eight indi- 
viduals, so excessively submissive as 
to constitute a social problem, were 
not discovered at all in terms of 
test scores. 


Only fourteen of the twenty-four cases 
which had serious difficulties through- 
out the year were classified as being 
self-conscious in combined results of 
both testings. Nine subjects who 
through self-analysis felt that self- 
consciousness was their chief diffi- 
culty received no special notice in 
terms of Inventory scores. Persons 
scoring low, according to specific 
personal data, were self-confident, 
but often neither ‘‘wholesomely”’ so 
nor very well adjusted to their en- 
vironment. Two cases with low 
F1-C scores were problem cases of 
unwholesome self-confidence arising 
as a compensatory mechanism. 


Several subjects scoring in highest 
range actually had difficulties because 
they were too sociable, and spent too 
much time in various social activities. 
One case, who received the lowest 
rating of all, never dated, did not join 
in house parties, refrained from 
speaking in house meetings, and was 
almost totally unacquainted with 
other girls in the dormitory. 


SUMMARY AND CONCLUSIONS 


This study attempted to determine: (1) The effect of a modifi- 
cation in directions causing subjects to respond to the questions of 
the Bernreuter Personality Inventory in terms of experiences of the 
preceding year as well as of the total life span and; (2) the validity 
of the classifications furnished by the Inventory scores as compared 
with clinical observations of the subjects’ behavior during a scholastic 
year. The subjects were eighty-one residents of a girls coéperative 


Comparison of Test Records and Clinical Evaluations 143 


dormitory and one non-resident student at the University of lowa. 
Intimate contact with the subjects was established by the tester by 
virtue of her being social and personnel director of the dormitory. 
Subjects were tested and retested individually so as to assure closely 
comparable physiological and psychological conditions for the two 
testings. A detailed behavior record with many specific illustrative 
incidents was kept for each subject throughout the year. The records 
were kept on an objective, descriptive level; no interpretations were 
made until the complete record for the year was available. On the 
basis of these records it was possible to form an estimate of the per- 
sonality characteristics and adjustment of each girl. 

The findings suggest that the two scorings yield such closely 
comparable results that either kind of directions may be used with 
almost identical results. There appears to be much overlapping of 
test scorings, as, for example, in the case of the neurotic and introver- 
sion scales. In other areas, the statistical findings suggest patterns 
of traits which are not normally associated in clinical findings regarding 
personality adjustment, and the absence of certain relationships 
generally found in the psychological clinic. 

The close intercorrelations between the neurotic, introversion, and 
submissiveness and self-consciousness scales suggest widespread over- 
lapping in these areas. Introversion and self-consciousness are highly 
related as are extroversion and self-confidence, indicating further 
overlapping. Dominant traits seem to be associated with self-con- 
fidence to a high degree. As a matter of fact, some of the interscale 
correlations are higher than the intercorrelations between the two 
testings for one given scale. 

The differences in mean scores suggest that in terms of events and 
reactions during the immediately preceding year, the group as a whole 
was somewhat better adjusted in all phases. Only one of these 
differences (dominance-submission) was statistically significant. A 
slightly greater variability occurred when the questions were con- 
sidered in terms of the last year’s events. Analysis of specific item 
changes indicates that the year of college experience contributed to 
greater socialization and maturity of attitude. It is suggested that 
the apparent improvement in personality adjustment may possibly 
be a result of greater willingness to criticize the past rather than the 
present self. A need for further study of this point is indicated. 

A large amount of disagreement was found between the Inventory 
scores and the actual behavior records of maladjustment. Personality 
problems, arising only in the face of certain specific circumstances, 


| 
ut 


144 The Journal of Educational Psychology 


were frequently obscured by the generalized type of questions con- 
tained in the Inventory. Cases which should be recommended for 
psychiatric treatment in terms of the norms, actually manifested 
problems of less seriousness than other cases given satisfactory adjust- 
ment ratings. 

At the extreme of good adjustment the Inventory scores showed 
somewhat better but far from perfect agreement with the clinical 
records. Serious problems of behavior were found as frequently 
among subjects at this end of the various scales, as were noted at 
the extreme of maladjustment. 

It made but little difference in the foregoing comparisons whether 
the inventory scores used were based upon the usual directions or 
the experimental ones. However, there appeared to be slightly better 
agreement when comparisons were based upon those cases holding the 
same relative positions on both testings. 

The factor of recency of experiences as related to personality 
adjustment is one whose importance may perhaps only be determined 
through further clinical investigation. However, as far as the responses 
to the Inventory under consideration are concerned, recency of experi- 
ences seems to make little difference in the validity of the scores. 

Extensive discussions with the subjects after they had taken the 
Inventory suggest that a consideration which is probably of basic 
significance in partially explaining the lack of validity of personality 
questionnaires lies in the fact that the situations upon which the 
questions are based are so general that most subjects are unable to 
answer with either ‘“‘yes” or “no” and have a feeling that this is an 
accurate personal description. Oftentimes the conditions surrounding 
a particular experience are highly specific and the reactions elicited 
could never be found again without an exact duplication of the original 
psychological situation. 

Another interesting point made by many of the subjects was the 
fact that they themselves were not conscious of the occurrence of 
certain behavior on their parts and, therefore, gave, without at all 
intending to, a picture of their behavior that often did not square with 
that which was objectively observed by their associates. 

In view of the fact that the Inventory was given under the most 
ideal conditions, with a thorough working rapport established, the 
fact that it identified so few of the clinically diagnosed cases of per- 
sonal maladjustment raises serious question as to the validity of this 
and similarly conceived tests. 


THE RELATIONSHIP BETWEEN FACTORS OF OCULAR 
EFFICIENCY AND EYE-MOVEMENT MEASURES 
AT THE COLLEGE LEVEL 


ROBERT Y. WALKER AND HERMAN MOLISH 
Ohio State University 


Several recent theories on possible causes for reading difficulty 
have as their basis possible refractive errors or intra-ocular incoérdina- 
tion of the eyes during reading.* The present problem attempts to 
determine the relationship between factors of ocular efficiency and 
eye-movement measures at the college level. 

The foundation of such theories developed from the early work on 
eye-movements of Javal,'4 and Erdman and Dodge.’ Later research 
has generally been centered around the problem of determining to 
what extent the so-called “peripheral factors,” or eye-movements, 
may be regarded as fundamental causes of reading deficiency. Betts,* 
Clark,® Dearborn,* Eames,’ Gates and Bennett,!? Selzer,'* and Wagner?! 
are representative of those who contend that reading disability and 
visual inefficiency are closely related and that visual deficiency is the 
determinant of the reading deficiency. 

Other research followed which examined the “‘central factors” in 
their relation to reading disability. Anderson and Kelley,? Gray, 
Ruediger,'® Stromberg,’” Swanson and Tiffin,"* and Witty and Kopel** 
are representative of those who believe that reading disability is not 
related to visual inefficiency among elementary or college students. 

Recent experimentation indicates that neither view can be held 
independently of the other. Farris!® and Fendrick" are examples of 
this point of view. Although disclaiming any relationship between 
muscular anomalies of the eye and reading, they hold that refractive 
errors of vision may be related to reading disability and inefficiency 
and should be regarded as peripheral symptoms of “central” defi- 
ciencies. In a final consideration of the literature a combination of 
the peripheral and central factors as interacting must be indicated. 

In order to prosecute the investigation of ocular efficiency in rela- 
tion to eye-movements, ninety-six college students were used as sub- 
jects in the study. Fifty-four of these subjects were selected from a 
course dealing with the psychology of effective study and individual 


* Reading difficulty is a relative measure and depends to a large extent on the 
educational and vocation level of the case. 
- 145 


146 The Journal of Educational Psychology 


adjustment. Asagroupthey wereinferiorinscholarship. Theremain- 
ing forty-two subjects were selected from an introductory course in 
general psychology and were of average or better than average scholar- 
ship. The mean point-hour ratio for the higher scholarship group 
(N = 42) was 2.98; that of the lower scholarship group (N = 54), 1.39. 
A point-hour ratio of 4.00 is the equivalent of an A record; 3.00, a B; 
2.00, a C; and 1.00, a D. The higher scholarship group had a mean 
centile rank of 93.9 on the Ohio College Association Aptitude Test; the 
lower scholarship group, a mean centile of 52.6. 

In order to determine the relationship existing between eye- 
movement measures and factors of vision, eye-movement records of 
reading were obtained by the Ophthalm-o-graph, an American Optical 
Co. instrument based on the usual corneal reflection method of eye 
photography. The eye-movement records were obtained in a private 
room, free from any distraction. Four eye-movement measures were 
secured: The number of regressions per hundred words, the number of 
fixations per hundred words, mean span of recognition, and mean 
fixation time. 

In the recording of the eye-movements, each subject read two 
selections. The first selection was considered a practice exercise in 
which the individual was to adapt himself to the experimental situ- 
ation. The eye-movement measures themselves were determined 
from the second selection the subjects read. This selection is of 
moderate difficulty; one of the standard selections furnished with the 
recording instrument. Comprehension was checked by a standard 
list of true-false questions for each selection. No significance was 
attached to the comprehension score from this test as it was used as a 
control for normal reading. 

The Keystone Telebinocular with the ‘a’ type of shaft calibrated 
for reading distance (near point), and for far point (infinity), was used 
in conjunction with the Betts Tests of Visual Sensation and Percep- 
tion.23 This arrangement served as the measuring device for the 
following ten factors of vision: (1) Distance fusion, (2) near-point 
fusion, (3) binocular visual efficiency, (4) left eye visual efficiency, 
(5) right eye visual efficiency, (6) coérdination level, (7) lateral imbal- 
ance at near point, (8) lateral imbalance at far point, (9) sharpness of 
image at near point, and (10) sharpness of image at far point. 

All tests of visual sensation and perception were administered under 
constant conditions and scored in accordance with the directions set up 
by Betts.‘ Since visual acuity is influenced by any refractive errors 


Ocular Efficiency and Eye-movement Measures 147 


which may be present, and may result in an erroneous measure of 
acuity, all of the subjects who wore corrections regularly, also wore 
them during the examination of their visual efficiency. Thus, a 
measure of absolute acuity was obtained which was as accurate 
as possible under the existing conditions of the examination. The 
assumption was made that the corrections worn by the subjects were 
adequate. 

The presence of any relationship between reading rate and eye- 
movement factors was established by the use of product-moment 
correlations. This measure was determined for reading rate versus 
number of fixations per hundred words, number of regressions per 
hundred words, mean span of fixation, and mean fixation time. The 
correlations for these measures are shown in Table I. Due to the type 


TABLE 1.—EYE-MOVEMENT Factors CORRELATED WITH READING SPEED IN WORDS 
PER MINUTE AS MEASURED BY THE OPHTHALM-O-GRAPH 


Eye-movement factors | r PE 
1. Number of fixations per 100 words........................ —.74 | +.03 
3. Number of regressions per 100 words...................... —.55 | +.04 
4. Mean fixation time in seconds..................eeeceeees —.46 | +.05 


of record obtained on the Ophthalm-o-graph these correlations serve, 
in a sense, as a check on the validity of the records. 

Biserial correlations were determined between the number of 
fixations and regressions per hundred words and mean fixation time 
versus efficiency on each of the ten factors of vision tested. In order 
to obtain these correlations, it was necessary to dichotomize scores 
“passing,” “doubtful,” and “failing,’”’ on each test of vision, into 
“passing” and “‘failing”’ categories. This was accomplished by com- 
bining the “doubtful” cases with the “‘failing’’ cases on the basis of 
the interpretation of a “doubtful” rating as determined by the Betts 
Tests. The correlations are given in Table II. 

Table I ranks in order the number of fixations per hundred words, 
mean span of recognition, the number of regressions per hundred words, 
and mean fixation time, as to the highest correlation with reading 
speed as obtained by eye-movement photography. The correlations 
between span of recognition and the number of fixations per hundred 
words with speed of reading do not differ significantly since the one 
measure is a function of the other. 


148 The Journal of Educational Psychology 


Table II indicates that no significant relationship exists between 
the number of fixations per hundred words, and any of the ten visual 
factors measured. There is some positive relationship between mean 
fixation time and near-point fusion and some negative relationship 
between mean fixation time and far-point fusion. There is no signif- 
icant relationship between the other eight tests of the Betts battery 
and average fixation time. No significant relationship exists at the 
college level between any of the vision tests and number of fixations 
and regressions per one hundred words. 


TABLE II].—TuHeE RELATIONSHIP BETWEEN EYE-MOVEMENT MEASURES AND Factors 


oF VISION 
Number of fixations Number of regressions Mean fixation time 
per 100 words per 100 words in seconds 
Test of vision 
Meant/ Meant Mean| Mean} Mean Mean 
s 
Bis fail | pass Bas fail | pass Bis fail | pass 
+.09+ 103.6} 98.6|+.03 + .13 13.6) 13.2)}—.39 + .11) .23 | .26 
Near-point fusion.................. —.03 + 98.5) 99.7|4+.01+ .13) 14.7) 14.6/4+.47+ .11| .26 | .23 
Binocular visual efficiency...........}+.01 + 99.3) 99.0)—.14+ 12.3) 14.7|—.10+ .17| .25 | .26 
Left eye visual efficiency............ —.13+ 94.1) 99.8|—.32+ .15) 11.5 15.2)—.13 + .18} .25 | .26 
Right eye visual efficiency...........}—.21+ 93.5) 102.4;—.16+ .14) 12.8) 15.1;—.09+ .15) .25 | .26 
Codrdination .12} 98.2) 98.7;—.00+ .12) 14.1) 14.2)+.05+ .13) .26 | .25 
Lateral imbalance at near point...... —.00+ 99.0) 99.1;—.11+ .11] 12.2) 14.1;—.08+ .25 | .26 
Lateral imbalance at far point....... —.06+ 96.0) 98.8|\—.22+ 11.4) 14.9/4+.17+ .16) .27 | .25 
Sharpness of image at near point... .. +.05 + 101.5) 99.0;—.18+ 13.1) 15.6;—.01+ .13) .25 | .26 
Sharpness of image at far point..... . —.20+ 94.2) 101.4;—.03 + 14.1] 14.7;—.17+ .24 | .26 


* Bis = Biserial coefficient of correlation. Estimate of error in terms of standard error. 
t Mean for those passing or failing a test of vision. 


The results of this study indicate that for the average individual 
an examination of his ocular condition will be of little value for any 
analysis of reading deficiency. It is conceivable that in cases of 
extreme deviation in refraction or coérdination reading difficulties will 
occur. The visual error, however, will have been evident in other 
behavior situations. 

A battery of visual tests such as the Betts is of real value in the 
primary grades if it is used as a general visual test for determining cases 
that need correction. Many such cases do appear in the lower grades 
of school. Such tests, however, should be confined to tests of vision 
and not to the determination of the subject’s readiness to read or for 
diagnosis of the reading difficulty. 


Ocular Efficiency and Eye-movement Measures 149 


If visual tests were of value in diagnosis of reading difficulties 
there should be evidence of relationship between some of the visual 
tests and some category of reading deficiency. Reading is a perceptual 
phenomenon and as such is dependent upon sensory functions. That 
does not imply that the sensory function is the paramount factor in 
perception. It is essential to reading, but is only part of several 
integrating functions that make up perception. 

The motor aspects of ocular behavior have been shown to be symp- 
toms of central functions, hence, it seems rather futile to attempt to 
correct the symptoms rather than the central fault. 


CONCLUSIONS 


The data of this study indicate that no significant relationship 
exists between the number of fixations and regressions per hundred 
words and any factor of vision measured by the Betts Test. 

A slight negative relationship exists between mean fixation time 
and distance fusion. 

A slight positive relationship exists between mean fixation time 
and near-point fusion. 

These correlations, although indicating a significant relationship, 
are of little value for the purposes of prediction, since they are of such 
small magnitude. 

These results agree with those of Swanson and Tiffin who found no 
significant correlations between the number of fixations, average dura- 
tion of fixation, and near-point fusion. 

The results of the present study give no support to the theory that 
eye-movement measures are related to peripheral characteristics of the 
eye as measured by the ten Betts tests of vision. . 


BIBLIOGRAPHY 


(1) Anderson, M., and Kelley, M.: “An inquiry into traits associated with read- 
ing disability.’”” Smith College Studies in Social Work, Vol. u, 1931, pp. 
46-63. 

(2) Betts, E. A.: ‘A physiological approach to reading disabilities.” Educ. Res. 
Bull., 1934, No. 13, pp. 163-174. 

(3) Betts, E. A.: The prevention and correction of reading disabilities. Chicago: 
Row, Peterson & Co., 1936, pp. 402. 

(4) Clark, B.: “The effect of binocular imbalance on the behavior of the eyes 
during reading.” J. Educ. Psychol., Vol. xxvi1, 1935, pp. 530-538. 

(5) Clark, B.: ‘‘Heterophoria in college students.’”” Amer. J. Optom., Vol. xu, 
1935, pp. 441-450. 


150 The Journal of Educational Psychology 


(6) Clark, B.: “‘Eye-movement photography as a diagnostic method in the 
determination of reading disability.” Amer. J. Optom., Vol. xm, No. 12, 
1936. 

(7) Clark, B.: “The importance of the correction of ocular defects in a remedial 
reading program. Results of recent research.” Amer. J. Optom., Vol. x11, 
1935, pp. 169-175. 

(8) Dearborn, W. F.: “‘The aetiology of so-called congenital word-blindness.” 
Psychol. Bull., Vol. xxv1, 1929, pp. 178-179. 

(9) Eames, T. H.: ‘The réle of visual defects in reading disability.” The Mass. 
Teacher, Vol. xu, No. 4, 1933, pp. 110-112. 

(10) Eames, T. H.: “A comparison of the ocular characteristics of unselected and 
reading disability groups.” J. Educ. Res., Vol. xxv, 1932, pp. 211-215. 

(11) Eames, T. H.: “Low fusion convergence as a factor in reading disability.” 
Amer. J. Opth., Vol. xvur, No. 8, 1934, pp. 709-710. 

(12) Dodge, R.: “Visual perception during eye movements.” Psychol. Rev., 
Vol. vir, 1900, pp. 454-465. 

(18) Eurich, A. C.: “Reliability and validity of photographic eye-movement 
records.” J. Educ. Psychol., Vol. xxtv, 1933, pp. 118-122. 

(14) Farris, L. P.: Visual defects as factors influencing achievement in reading. 
Unpublished Ed. D. dissertation. Univ. of Calif., Berkeley, 1934. 

(15) Fendrick, F. P.: ‘Visual characteristics of poor readers.’’ Teach. Coll. 
Columbia Univ., Contrib. Educ., 1935, No. 656, pp. 54. 

(16) Gates, A. I. ond Bennett C. C.: Reversal tendencies in reading. Causes, 
diagnosis, prevention, and correction. New York: Bureau of Publications, 
Teachers Coll., Columbia Univ., 1933, pp. 33. 

(17) Gray, C. T.: “Types of reading ability as exhibited through tests and labora- 
tory experiments.” Supp. Educ. Monog., Vol. v, 1917, pp. 122. 

(18) Ruediger, W. C.: “The field of distinct vision.” Arch. of Psychol., Vol. v, 
1907, pp. 67. 

(19) Selzer, C. A.- “Lateral dominance and visual fusion; their application to 
difficulties in reading, writing, spelling, and speech.” Harvard Monog. 
Educ., 1933, No. 12. Cambridge, Mass.: Harvard University Press, 1933, 
pp. 119. 

(20) Stromberg, E. L.: Visual characteristics of fast and slow readers among college 
students. Unpublished doctor’s dissertation, 1937, Univ. of Minn. 

(21) Swanson, D. C. and Tiffin, J.: “ Betts physiological approach to the analysis 
of reading disabilities as applied to the college level.”” J. Educ. Res., Vol. 
xxix, 1936, pp. 433-448. 

(22) Tinker, M. A.: “Photographic measures of reading ability.” J. of Educ. 
Psychol., Vol. xx, 1929, pp. 184-191. 

(23) Wagner, G. W.: “The maturation of certain visual functions and the rela- 
tionship between these functions and success in reading and arithmetic.” 
Psychol. Monog., Vol. xuvi11, 1937, pp. 108-146. 

(24) Witty, P. A. and Kopel, D.: “‘Heterophoria and reading disability.” J. Educ. 
Psychol., Vol. xxvu1, 1936, pp. 222-230. 


CHANGES IN INTEREST WITH CHANGES IN GRADE 
STATUS OF ELEMENTARY-SCHOOL CHILDREN?’ 


WILLIAM McGEHEE 
North Carolina State College, Raleigh, North Carolina 


The hobbies of students in the elementary school apparently have 
been neglected by the majority of investigators who have concerned 
themselves with problems of the development of the interests of this 
group of children. These activities seem a fruitful source of informa- 
tion on children’s interests as they are behavior manifestations sympto- 
matic of the spontaneous interests of the child. The purpose of this 
investigation is to study the grade trends in the interests of elementary- 
school children as revealed by their possession of specific hobbies. 
It is hoped to secure a partial answer, in the first place, to the questions 
of whether interests in specific hobbies increase or decrease with change 
in grade status; in the second place, whether certain types of interests 
—specifically social or non-social, sedentary or non-sedentary, intel- 
lectual or non-intellectual—increase or decrease with change in grade 
status. 

The data on the hobbies of elementary-school children presented 
here are a part of the information secured by the Coérdinated Studies 
in Education! in a survey involving three hundred ten communities 
and four hundred fifty-five schools in thirty-six states in the United 
States. The teachers of the children in these schools designated from 
a list of twenty-one hobbies (Table I) the hobby or hobbies which each 
child possessed. Data thus secured on the hobbies of the children 
in grades IV through VIII are the bases of this investigation. 

Over fifty thousand children in this survey are found ih grades 
IV through VIII. No attempt has been made to study the hobbies of 
each child; instead a random sample involving one thousand children 
of each sex at each grade level has been used. It is believed that this 
sampling represents a fairly adequate picture of the hobbies of the 


1 The writer wishes to acknowledge his indebtedness to the Advisory Committee 
of the Coérdinated Studies in Education, Incorporated, for the opportunity to use 
the data presented in this investigation. The members of the committee were: 
Paul L. Boynton, George Peabody College for Teachers; Harry A. Greene, Univer- 
sity of Iowa; LeRoy A. King, University of Pennsylvania; J. C. McElhannon, 
Baylor University; I. R. Obenchain, Birmingham Public Schools; Henry J. Otto, 
W. K. Kellogg Foundation; David Segel, United States Office of Education; and 
M. J. Van Wagenen, University of Minnesota. 


152 The Journal of Educational Psychology 


populations at the grade levels involved in this investigation. Evi- 
dence which supports this contention is found when the standard 
errors of the percentage of cases at each grade-sex level assigned each 
hobby are computed. None of these percentages is less than three 
times its standard error and the majority of them are much larger than 
this commonly accepted criterion of statistical reliability. 


TaBLE I.—List or Hossies 
NUMBER Hossies 
1. Reading—novels, mysteries, etc. 
2. Reading—history, science, etc. 
3. Reading—funny papers, etc. 
4. Active games or sports—football, tennis, etc. 
5. Quiet games—checkers, jacks, etc. 
6. Playing musical instruments; not radio or phonograph. 
7. Listening to radio or phonograph. 
8. Sewing, knitting, fancy work, etc. 
9. Housework—cooking, sweeping, etc. 
10. Going to shows. 
11. Dramatics—participating. 
12. Playing make-believe games—teacher, store, church, etc. 
13. Religious activity. 
14. Building things, or shop work. 
15. Traveling. 
16. Driving car, riding in airplane. 
17. Studying. 
18. Working—farm, store, etc. 
19. Clubs—social, dancing, etc. 
20. Scouting; serious forms club activity. 
21. Collecting. 


The data in Table II show the per cent of subjects at each grade-sex 
level assigned each of the twenty-one hobbies. Trends in interest in 
each hobby can be estimated by comparing the per cent of subjects 
assigned the specific hobby at the various grade levels. For example, 
twenty-seven and nine-tenths per cent of the boys at the fourth-grade 
level were assigned the first hobby, “reading novels, etc.,”’ while thirty- 
nine and four-tenths per cent or eleven and five-tenths per cent more 
of the boys at the fifth-grade level than at the fourth grade level are 
interested in this hobby. This difference of eleven and five-tenths 
per cent meets the usually accepted three sigma criterion of statistical 
reliability. The remainder of the table is to be interpreted in a simi- 
lar manner. 

The data in Table II should be interpreted in the light of both 
statistically reliable differences and of the consistency of differences 


: 
4 
: 


Changes in Interest with Changes in Grade Status 


from grade to grade. 
accepted criteria of statistical reliability, a consistent change from 
grade to grade offers some evidence of a trend in interest in respect to 
a given hobby. The following interpretation of the data in Table II 
has been made in light of these considerations. 


153 


Even where differences do not meet commonly 


TaBLE II].—TreNps &Y GRADES IN INTERESTS IN TERMS OF PERCENTAGE OF 
Supsects aT Grape Levet Assignep Eacu Hossy 


Boys Girls 
Hobby 
Grade| Grade| Grade| Grade} Grade} Grade} Grade} Grade| Grade} Grade 
IV Vv VI VII | VIII; IV Vv VI VII | VIII 
1. Reading novels, etc.| 27.9 | 39.4!| 40.1 | 44.2 | 48.9%) 49.8 | 47.1 | 60.2! 
2. Reading history, 
Oicccecccaesndae 15.4 | 18.87| 21.8 | 13.4'| 17.0 | 14.4 | 17.6 | 18.7 | 11.74] 14.2? 
3. Reading comics, : 
Giberebkcceredex 40.7 | 45.57) 48.7 | 49.4 | 52.5 | 41.5 | 51.31] 47.3 | 49.7 | 55.0% 
4. Active games...... 53.7 | 59.72| 56.3 | 69.5 | 64.4%) 37.1 | 40.90 | 44.4 | 55.1!) 64.5!» 
5. Quiet games....... 22.1 | 23.5 | 24.1 | 24.2 | 21.3 | 26.7 | 31.1%} 28.4 | 26.5 | 29.4 
6. Playing musical in- 
struments......... 9.8 | 14.11] 17.1 | 15.1 | 19.7'| 14.8 | 21.6%] 22.5 | 23.0 | 29.4! 
7. Listening to radio, 
Gcdesbccntset ce 33.6 | 29.6 | 30.3 | 24.2'| 36.3'| 30.3 | 31.0 | 37.31| 40:2 | 47.1! 
8. Sewing, etc........ 1.9 1.8 2.0 2.3 1.4 | 37.4 | 38.7 | 38.5 | 38.7 | 39.9 
9. Housework........ 7.9 | 7.4 7.1 6.4 5.4 | 35.8 | 32.4 | 36.6 | 42.2*| 43.8 
10. Going to shows....| 29.3 | 35.3!| 33.1 | 36.7 | 36.0 | 24.7 | 31.8'| 34.1 | 36.5 | 43.7! 
11. Dramatics—partic- 
4.4| 2.3 5.3 5.4 5.6 8.5 | 8.6 | 14.3") 13.1 | 17.0% 
12. Playing make-be- 
lieve games........ 12.9 | 15.1 | 12.7 6.01} 39.0 | 34.4%} 21.2'| 16.7%) 5.7! 
13. Religious activity..| 12.9 | 16.27| 13.3 | 16.9%) 12.8] 18.5 | 20.7 | 19.7 | 20.5 | 24.1 
14. Building things....| 29.9 | 34.1!) 35.1 | 35.2 | 41.3%] 3.5) 7.1'| 5.1 3.2%} 2.5! 
15. Traveling......... 5.8 | 9.51) 12.97| 11.3 | 13.4 5.2 | 7.6%} 8.9 | 13.5%) 18.6! 
16. Driving cars, riding a 
in airplane........ 3.3 | 3.3 | 14.7'| 13.9 | 16.0 0.8; 1.2] 3.7 5.8*| 9.21 
17. Studying.......... 6.8 6.3 5.4 5.8 6.6 | 10.54] 7.6!) 7.1 | 11.9! 
18. Working; store, 
10.7 | 16.41} 17.9 | 21.1 | 20.2 2.6) 5.7!) 5.8 6.6 | 7.3 
19. Clubs: Social... ... 1.7 1.9 2.9 2.5 4.97) 3.9 | 4.6 6.22; 11.2!| 17.0! 
20. Scouting, etc...... 2.6 | 12.3!) 14.8 | 16.5 1.1} 3.0 7.31) 11.31} 25.8! 
21. Collecting......... 15.7 | 14.1 | 16.6 | 19.6 | 22.7 | 10. 14.5!| 16.9%) 21.2%) 26.9! 
Number of cases....... 1133 | 1003 | 1000 | 1000 | 1000 | 1055 | 1004 | 1000 | 1000 | 1000 


grade is statistically reliable. 


true difference existing. 


1 Indicates that the difference between the per cent thus marked and the per cent in the preceding 


? Indicates that for the difference thus marked about ninety-eight chances out of a hundred of a 


No clear cut trends in change of interest with change of grade 
status for either boys or girls are found in the hobbies reading novels, 


= 
a 
4 


154 The Journal of Educational Psychology 


etc., quiet games, sewing, and religious activities. No trends are found 
for boys in the hobbies listening to radio, etc., dramatics, and studying. 
Boys show a consistent decrease in interest in the hobby housework, 
while girls show a similar trend in the hobby playing make-believe 
games. 

Both boys and girls tend to show an increase in interest with 
increase in grade status in working, social clubs, and scouting. Boys 
alone show a similar tendency in the hobby reading comics; girls show a 
similar tendency in the hobbies active games, playing musical instru- 
ments, listening to the radio, etc., going to shows, traveling, driving cars, 
etc., and collecting. 

Both boys and girls show an increase in interest in light reading at 
the fifth-grade level in comparison with the fourth grade. Interest 
in this hobby for both sex increases again at the eighth-grade level. 
Girls show an increased interest in reading comics at the fifth-grade 
level, but no consistent trend thereafter. 

Boys show an increased interest at the fifth-grade level in the hob- 
bies active games, going to shows, playing musical instruments, building 
things, and traveling. Slight increases at the seventh-grade level in 
interest in active games, at the eighth-grade level playing musical instru- 
ments and building things are also found for boys, while interest in 
going to shows, and traveling remains relatively constant from the fifth 
through the eighth grade. Boys show a definite decrease in interest 
in playing make-believe games after the sixth-grade level is passed, and 
an increase in interest with increase in grade status in driving cars, etc., 
up to the seventh-grade level. No consistent trend for boys in the 
hobby collecting is found, although there isa tendency towards increased 
interest with increase in grade status from grades VI through VIII. 

Girls show no marked trend in interest in housework, although 
there is an increase at the seventh-grade level. Likewise, for girls 
there is a trend, not clear cut, for increase in interest in dramatics at the 
later two grade levels in comparison with the first two. There is a con- 
sistent decrease in interest on the part of girls in building things after 
the fifth-grade level is passed. 

It seems, then, from the data presented that changes in interest 
with changes in grade status in terms of possession of certain hobbies are 
characteristic of elementary-school children at the grade levels involved 
in this investigation. Does it follow that hobbies can be classified 
according to the types of interests they represent and grade trends in 


these types be found? 


Changes in Interest with Changes in Grade Status 155 


An attempt has been made to classify the hobbies in this study into 
the categories of social, non-social, sedentary, non-sedentary, intellec- 
tual, and non-intellectual. A questionnaire was sent to thirty-five 
psychologists and educators, whose major interest was in child develop- 
ment, asking them to indicate whether they considered each of the 
twenty-one hobbies as being predominantly (1) social or non-social, 
(2) sedentary or rion-sedentary, (3) intellectual or non-intellectual, 
(4) artistic or non-artistic, and (5) source of direct or a source of 
vicarious satisfaction! in the case of children in the elementary grades 
who may possess the specific hobby under consideration. These 
individuals were also told that, “It is possible some of the hobbies 
can not be piaced in all the alternate categories. If this happens, 
check only the alternates in the remaining categories.” 


TaBLeE PercentaGE oF Hopsies aT GRADE-SEX LEVEL CLASSIFIED 
IN THE STATED CATEGORIES AND OF THE Hopsres Wuicu CAN Bg CLASSIFIED 
IN NEITHER Two Opposite CATEGORIES 


Boys Girls 
Category 
Grade| Grade} Grade| Grade} Grade} Grade} Grade| Grade} Grade| Grade 
IV Vv VI VII | VIII IV v VI VII | VIII 
ee ee 35.0 | 35.8 | 33.6 | 36.7 | 31.5'| 33.8 | 32.2 | 31.1 | 31.6 | 31.3 
Non-social............ 39.2 | 37.1 | 39.6 | 36.4 | 39.9 | 46.0 | 40.7!) 38.2 | 35.1!| 33.6 
Neither social or non- 

A Pe 25.2 | 27.1 | 26.8 | 26.9 | 28.6 | 20.2 | 29.1!) 29.7 | 33.3 | 35.1 
cc ados anes 50.5 | 49.5 | 46.7 | 42 5!) 43.5 | 58.4 | 56.1 | 54.0 | 50.5!) 49.9 
Non-sedentary........ 37.3 | 37.8 | 37.7 | 39.4 | 36.9 | 33.6 | 31.3 | 31.5 | 33.9 | 32.7 
Neither sedentary or 

non-sedentary....... 12.2 | 12.6} 15.6 | 18.1'| 19.6 8.0 | 12.6 | 14.5 | 15.6 | 17.4 
Intellectual........... 12.1 | 12.2 | 13.1 | 12.4 | 12.8 6.6 | 8.21) 8.7 7.8 8.0 
Non-intellectual....... 31.2 | 30.0 | 30.5 | 32.9 | 31.0 | 38.6 | 36.5 | 37.3 | 39.4 | 37.6 
Neither intellectual or 

non-intellectual...... 56.7 | 57.8 | 56.4 | 54.7 | 56.6 | 54.8 | 55.3 | 54.0 | 52.8 | 54.4 
Number of hobbies... .| 3953 | 4068 | 4306 | 4373 | 4669 | 4294 | 4647 | 4743 | 5097 | 6040 


1 Indicates a statistically reliable difference between the per cent thus marked and the per cent 
in the preceding grade level in the same category. 


Twenty responses were received to the questionnaire. When 
three-fourths of the respondents agreed that a hobby should be placed 
in a specific category, it was thus classified. On this basis the social 
hobbies are hobbies number 4, 5, 11, 12, 13, 18, 19, and 20; non-social 


' The last two categories (4 and 5) due to the nature of the distribution of the 
responses have not been used in this investigation. 


| 

j 
| | 

| 


156 The Journal of Educational Psychology 


hobbies are numbers 1-3, 5, 7, 8, and 10; sedentary, numbers 1-3, 5, 7, 
8, 10, and 17; non-sedentary, numbers 4, 9, 11, 12, 14, 15, and 18-20; 
intellectual, numbers 2, 18, and 21; and non-intellectual, 3, 4, 8, 9, 
16, and 19. 

The percentages of the total number of hobbies found at each 
grade-sex level which were assigned to each of the six categories in 
this investigation were computed. These data, as well as the per cent 
of hobbies which could be assigned to neither of the three contrasting 
pairs of categories, are shown in Table III. It is apparent, from these 
data, that no clear-cut trends in change of type of interest as repre- 
sented by the classification in Table III are found. The differences 
in per cents at the various grade levels are, in the main, small, incon- 
sistent, and lack statistical reliability. Even where there are sugges- 
tions of trends (decrease in non-social hobbies for girls with increase 
in grade status) according to one category, no corresponding increase 
in the opposite category (social hobbies) is found; rather, any 
increase seems to come in hobbies which are neither predominantly 
social nor non-social. 

It seems, then, on the basis of the data presented in this study, 
that the tentative conclusion may be drawn that grade trends in 
interests of elementary-school children do exist but are functions of 
specific activities (a particular hobby) rather than of general types of 
interests, such as those into which the hobbies in this investigation 
have been classified. It may be suggested, further, that an analysis 
of children’s hobbies starting at lower grade levels than those used in 
the present investigation and extending into the secondary-school 
levels would furnish interesting and useful information on the change 
in children’s interests with changes in grade status. 


: 
ky 
4 
Py 
{ 


THE RELIABILITY OF THE FERSON-STODDARD LAW 
APTITUDE EXAMINATION 


FREDERICK J. GAUDET AND BRITTEN L. RIKER 
University of Newark 


Mortality and turn-over in law schools have long been problems 
of concern to institutions of legal learning and the legal professions, 
as well as to individual students of law. In the early twenties when 
the better class of law schools began requiring two or more years of 
college work as prerequisites to admission, the question of selection of 
students was considered solved by many. However, the continuing 
high mortality among law school students as well as among those 
taking bar examinations following graduation, indicated that the 
problem was still present. 

In order to lessen this mortality and to guide prospective law 
school students, a legal aptitude test was devised by Dean Ferson of 
the University of North Carolina Law School and Professor Stoddard. 
There have been several studies of the validity of this test in the 
fifteen years of its existence, which demonstrate its worth as a predic- 
tive device. Stoddard has shown that it has a higher validity than 
tests of general intelligence in predicting law school grades,' scores 
correlating +.55 and +.54 with first semester and entire first year 
scholarship, respectively. 

The test was revised in 1927 and a study of this version gave a 
coefficient of correlation of +.42 between freshman grades in law 
school and test scores.2 Several other minor studies of the validity 
of the test have also been made.’ . 


! Stoddard, George D.: “‘Ferson and Stoddard Law Aptitude Examination— 
A Preliminary Report.’”’ Amer. Law School Rev., Vol. v1, No. 2, March, 1927, 
pp. 78-81. 

* Gaudet, Frederick J. and Marryott, Franklin J.: “Predictive Value of the 
Ferson-Stoddard Law Aptitude Examination.’”” Amer. Law School Rev., Vol. 
vu, No. 1, Dec., 1930, pp. 27-32. 

* Wigmore, John H.: “Juristic Psychopoyemetrology—or, How to Find Out 
Whether a Boy Has the Makings of a Lawyer.” Jil. Law Rev., Vol. xxv, 
No. 4, Dec. 1929, pp. 454-465; Wigmore, John H.: “Tests of Legal Aptitude.” 
ll. Law Rev., Vol. xxtv, No. 6, Feb. 1930, pp. 680-683; Crawford, Albert 
B.: “Letter to the Editors.” Jll. Law Rev., Vol. xx1v, No. 7, Mar. 1930, pp. 
801-866. 

- 187 


, 
1 
| 4 


158 The Journal of Educational Psychology 


THE PRESENT STUDY 


From the foregoing we see that this is a test which has a long 
history of validation. Normally one seeks to determine the reliability 
of a test before ascertaining its validity, but in spite of its many 
appearances in the literature there is still no measure of its reliability 
in published form.! It may be said that it is not unusual to find tests 
of special aptitudes in which the reliability is not known. For 
instance, in the chapter entitled ‘“‘Tests in Special Fields’”’ in Garrett 
and Schneck’s Psychological Tests, Methods, and Results? there are 
twenty-one tests described, of which only thirteen have measures of 
reliability. Furthermore, several of these tests in which the coeffi- 
cient of reliability is stated do not indicate the number of cases upon 
which this measure is based, nor do they always indicate the technique 
used in obtaining this coefficient. 

As we have already seen the validity of the Ferson-Stoddard test 
for legal aptitude is sufficiently high to grant it serious consideration, 
it becomes important to know its reliability. Certainly anyone inter- 
ested in attempting to increase the validity of the test would want to 
know how consistently it measures whatever it is measuring before 
doing further work on it. It was the purpose of the present investiga- 
tion to determine a measure of the reliability of this testing instrument. 

The examinations used in this study were taken by two hundred 
freshmen entering the New Jersey Law School‘ in the Fall semesters 
of 1929, 1930 and 1931. All of these students had completed at least 
two years of college training, and many bad had three or four years. 

Two measures of reliability were calculated, one on the weighted 
scores and one on the basis of the raw number of correct items. As 
stated above, the present version of the Ferson-Stoddard Law Aptitude 
Examination is made up of four parts, in which the scoring varies as 
follows: For Part 1 the score is the number correct multiplied by 2; 
the scores for Parts 2 and 3 are simply the number of correct answers; 


1JIn correspondence received in January, 1940, both Stoddard and Ferson 
report that a measure of reliability has never been calculated. 

2 Garrett and Schneck. Psychological Tests, Methods, and Results, New York: 
Harper Bros., 1933. 

3 All of these tests are not aptitude tests. However, the Ferson-Stoddard is 
not described as being a pure aptitude test, but a combination of an aptitude and 
placement test. See Ferson, Merton L.: “Law Aptitude Examination.’”’ Amer. 
Law School Rev., Vol. v, No. 10, Dec. 1925, pp. 563-565. 

* Now the Law School of the University of Newark. 


1 
wl | 4 


Ferson-Stoddard Law Aptitude Examination 159 


and the score for Part 4 is obtained by multiplying each correct 
response by 3. In calculating both coefficients of reliability, the split- 
half method was used. 

The coefficient of correlation between the scores on the odd-and- 
even items when weighted was +.86. Applying the Spearman-Brown 
prophecy formula to this correlation, we get an estimated coefficient of 
reliability of +.92.‘ This high reliability certainly compares favorably 
with the reliability of most aptitude tests in special fields. 

The calculation of the reliability of the test on the basis of the 
number of items correct without weighting yielded a coefficient of 
+.95, which, when calculated for the probable reliability of the whole 
test, became +.97. These figures should be of value in more detailed 
studies of this test in terms of item analysis and the weighting of the 
scores which are planned by the present authors. 

In the original 1925 version of the test Stoddard found that all 
parts were not equally valid. He said: 


Computations based on University of lowa data indicated that for the 
first semester Part 1 (Memory) is of most value, and Part 3 (Reasoning by 
Analogy) of least value. But for estimating average first-year grades in all 
subjects, Part 2 (Reading Comprehension) proved best and Part 3 worst.! 


In order to determine the relationship between the individual parts 
of the test among themselves and between them and the total test 
score, appropriate correlations were computed on the same two 
hundred tests using the weighted scores. These inter-correlations are 
presented in Table I. It will be observed that the correlations between 


TaBLe I. —INTER-CORRELATIONS OF THE PARTS OF THE TEST WITH ONE ANOTHER 
AND WITH ToTaAL ScoREs 


Part 1 Part 2 | Part 3 Part 4 Total 


the various parts of the test and the total score are fairly high, ranging 
from +.58 to +.71. Moreover the inter-correlations among the parts 
of the test are lower than one usually finds, (from +.34 to +.52) 


? Stoddard, op. cit., p. 80. 


160 The Journal of Educational Psychology 


which indicates that the parts of the test do not measure the same 
mental functions in spite of the high validity of the test as a whole.! 


SUMMARY 


Several studies have shown the Ferson-Stoddard Law Aptitude 
Examination to yield valid predictions of success in law school. For 
some time it has been known that it was a valid measure of legal 
aptitude but unfortunately nothing was known about the consistency 
with which it measured it. Since without a known measure of relia- 
bility a testing instrument is of questionable predictive value, the 
present study was intended to fill this gap. It was found that the 
test showed high reliability or consistency (+.92 to +.97) which should 
stimulate its use and warrant further research in the field. 


1 For instance, the average of the inter-test correlations of the Army Alpha is 
+.73. See Garrett and Schneck, op. cit., pp. 44-45. 


* 

4: 


